文章/答案/技术大牛

发布

社区首页 >问答首页 >不同大小numpy数组元素的条件

问不同大小numpy数组元素的条件
EN

Stack Overflow用户

提问于 2018-09-07 01:51:11

回答 1查看 208关注 0票数 0

我在伪Python代码中有以下情况，需要为它找到一个矢量化的解决方案来进行优化，因为我要处理数十万个语音分析条目，而嵌套的for循环是不可行的。我想知道如何对不同大小的数组进行条件检查。例如，我知道np.greater，但这是一个元素智能操作，对于不同大小的数组都会失败。

words = [
    {'id': 0, 'word': 'Stu', 'sampleStart': 882, 'sampleEnd': 40571},
    {'id': 0, 'word': ' ', 'sampleStart': 40570, 'sampleEnd': 44540},
    {'id': 0, 'word': 'eyes', 'sampleStart': 44541, 'sampleEnd': 66590},
]

phonemes = [
    {'id': 0, 'phoneme': ' ', 'sampleStart': 0, 'sampleEnd': 881},
    {'id': 1, 'phoneme': 's', 'sampleStart': 882, 'sampleEnd': 7937},
    {'id': 2, 'phoneme': 't', 'sampleStart': 7938, 'sampleEnd': 11906},
    {'id': 3, 'phoneme': 'u', 'sampleStart': 11907, 'sampleEnd': 15433},
    {'id': 3, 'phoneme': ' ', 'sampleStart': 15434, 'sampleEnd': 47627},
    {'id': 3, 'phoneme': 'eye', 'sampleStart': 47628, 'sampleEnd': 57770},
    {'id': 3, 'phoneme': 's', 'sampleStart': 57771, 'sampleEnd': 66590},
]

associatedData = []
for w in words:
    startWord = w['sampleStart']
    endWord = w['sampleEnd']
    word = w['word']
    w_id = w['id']
    for p in phonemes:
        startPhoneme = p['sampleStart']
        endPhoneme = p['sampleEnd']
        phoneme = p['phoneme']
        p_id = p['id']
        if startPhoneme >= startWord and endPhoneme <= endWord:
            # I need to relate this data as it comes from 2 different sources
            # Some computations occur here that are too ling to reproduce here, this multiplication is just to give an example
            mult = startPhoneme * startWord
            associatedData.append({'w_id' : w_id, 'p_id': p_id, 'word' : word, 'phoneme' : phoneme, 'someOp': startWord})

# Gather associated data for later use:
print(associatedData)

解决这个问题的好方法是什么？我对向量运算比较陌生，我已经在这方面挣扎了相当长的时间，但没有太多结果。

vectorization

python

numpy

vector

conditional

回答 1

Stack Overflow用户

发布于 2018-09-07 17:16:46

查看每个单词的所有可能的音素将不会缩放。正在完成的工作量比需要的要高。对于任意数量的words和phonemes，总会有使用这种方法的len(words) * len(phonemes)操作。矢量化可以加速这一过程，但更好的做法是降低复杂性本身。

相反，对于每个单词，试着只看几个音素候选。一种解决方案是让指针指向当前的音素。对于每个新单词，在匹配音素的范围内迭代(本地，仅在当前音素指针附近)。

伪码解决方案：

# skip if already sorted
words = sorted(words, key=lambda x:x["sampleStart"])
phonemes = sorted(phonemes, key=lambda x:x["sampleStart"])

phoneme_idx = 0
for w in words:

    # go back until the earliest relevant phoneme
    while endtime(phonemes[phoneme_idx]) < starttime(w):
         phoneme_idx -= 1

    # evaluate all phonemes in range
    while endtime(phonemes[phoneme_idx]) <= starttime(w):
         # match and compute
         evavalute(phonemes[phoneme_idx], w)
         phoneme_idx += 1

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52209695

复制

相似问题

问不同大小numpy数组元素的条件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不同大小numpy数组元素的条件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不同大小numpy数组元素的条件
EN