首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >多处理starmap_async python

多处理starmap_async python
EN

Stack Overflow用户
提问于 2021-01-05 18:05:40
回答 1查看 937关注 0票数 0

我正在学习如何在python中使用多重处理,我有一个问题。我要计算一个对象(即单词元组)在列表中的次数。我提出两个方案。第一个使用pool.starmap_async,第二个不使用多处理。

代码语言:javascript
复制
ngrams=[('review', 'productivity'), ('productivity', 'satisfaction'), ('satisfaction', 'democratic'), ('democratic', 'autocratic'), ('autocratic', 'leadership'), ('leadership', 'empirical'), ('empirical', 'literature'), ('literature', 'explore'), ('explore', 'organizational_outcome'), ('organizational_outcome', 'democratic'), ('democratic', 'leadership'), ('leadership', 'task##oriented'), ('task##oriented', 'group'), ('group', 'individual'), ('individual', 'member'), ('member', 'productivity'), ('productivity', 'satisfaction'), ('satisfaction', 'receive'), ('receive', 'attention'), ('attention', 'emphasis')]
ngrams_uniq=[('satisfaction', 'democratic'), ('organizational_outcome', 'democratic'), ('review', 'productivity'), ('democratic', 'leadership'), ('member', 'productivity'), ('receive', 'attention'), ('empirical', 'literature'), ('group', 'individual'), ('literature', 'explore'), ('democratic', 'autocratic'), ('autocratic', 'leadership'), ('attention', 'emphasis'), ('task##oriented', 'group'), ('explore', 'organizational_outcome'), ('leadership', 'task##oriented'), ('satisfaction', 'receive'), ('productivity', 'satisfaction'), ('leadership', 'empirical'), ('individual', 'member')]

def count_ngrams(gram,ngrams):
  return (gram,ngrams.count(gram))

##With池

代码语言:javascript
复制
print(time.strftime("%H:%M:%S"))
pool = mp.Pool(mp.cpu_count())
dict_freq_ngrams=pool.starmap_async(count_ngrams,[(gram,ngrams) for gram in ngrams_uniq]).get()
pool.close()
print(time.strftime("%H:%M:%S"))

##Without池

代码语言:javascript
复制
print(time.strftime("%H:%M:%S"))
dict_freq_ngrams=[count_ngrams(gram,ngrams) for gram in ngrams_uniq]
print(time.strftime("%H:%M:%S"))

当我测量执行时间时,我总是会发现第二个选项更快。我不明白为什么会这样..。也许我有一个错误,但我不知道是什么。

提前感谢

EN

回答 1

Stack Overflow用户

发布于 2021-01-06 00:16:22

我认为您没有错误,而是将数据从多处理中复制到新解释器的开销-- paralel计算所取得的速度增益,因为启动池在我的表面上需要0.2到0.3秒。

她的代码是我用来测试的

代码语言:javascript
复制
import time
import multiprocessing as mp
import matplotlib.pyplot as plt
import numpy as np
import copy

ngrams=[('review', 'productivity'), ('productivity', 'satisfaction'), ('satisfaction', 'democratic'), ('democratic', 'autocratic'), ('autocratic', 'leadership'), ('leadership', 'empirical'), ('empirical', 'literature'), ('literature', 'explore'), ('explore', 'organizational_outcome'), ('organizational_outcome', 'democratic'), ('democratic', 'leadership'), ('leadership', 'task##oriented'), ('task##oriented', 'group'), ('group', 'individual'), ('individual', 'member'), ('member', 'productivity'), ('productivity', 'satisfaction'), ('satisfaction', 'receive'), ('receive', 'attention'), ('attention', 'emphasis')]*40
ngrams_uniq=[('satisfaction', 'democratic'), ('organizational_outcome', 'democratic'), ('review', 'productivity'), ('democratic', 'leadership'), ('member', 'productivity'), ('receive', 'attention'), ('empirical', 'literature'), ('group', 'individual'), ('literature', 'explore'), ('democratic', 'autocratic'), ('autocratic', 'leadership'), ('attention', 'emphasis'), ('task##oriented', 'group'), ('explore', 'organizational_outcome'), ('leadership', 'task##oriented'), ('satisfaction', 'receive'), ('productivity', 'satisfaction'), ('leadership', 'empirical'), ('individual', 'member')]
ngrams_copy=copy.copy(ngrams)

def count_ngrams(gram,ngrams):
    return (gram,ngrams.count(gram))



if __name__ == "__main__":
    std = np.array([])
    Pool= np.array([])
    for i in range(100):
        
        t = time.time()
        with mp.Pool(mp.cpu_count()) as pool:
            res=pool.starmap_async(count_ngrams,[(val, ngrams) for val in ngrams_uniq])
            dict_freq_ngrams = res.get()#(gram,ngrams) for gram in ngrams_uniq]

        Pool = np.append(Pool, np.array(time.time() - t))
        print(i)

        t = time.time()
        dict_freq_ngrams=[count_ngrams(gram,ngrams) for gram in ngrams_uniq]
        std = np.append(std, np.array(time.time() - t))
        ngrams = ngrams+ngrams_copy

    plt.plot(std)
    plt.plot(Pool)
    plt.show()
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65584238

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档