文章/答案/技术大牛

发布

社区首页 >问答首页 >defaultdict记忆效率

问defaultdict记忆效率
EN

Stack Overflow用户

提问于 2015-06-28 17:16:50

回答 2查看 248关注 0票数 0

我正在尝试一些计算PMI的例子，试图计算我拥有的一些tweet消息(集合为~50k)，如果发现算法实现的瓶颈在defaultdict(lambda : defaultdict(int))中，我不知道为什么：

下面是我描述它的例子，它占用了很多内存和时间。

for term, n in p_t.items():
    positive_assoc = sum(pmi[term][tx] for tx in positive_vocab)
    negative_assoc = sum(pmi[term][tx] for tx in negative_vocab)
    semantic_orientation[term] = positive_assoc - negative_assoc

如该部分：

positive_assoc = sum(pmi[term][tx] for tx in positive_vocab)
negative_assoc = sum(pmi[term][tx] for tx in negative_vocab)

因为某种原因分配了很多内存。我假设不存在的值返回0，因此传递给sum函数的数组非常大。

我用简单的if value exist和变量sum_pos解决了这个问题。

博客的整个实现：

pmi = defaultdict(lambda : defaultdict(int))
for t1 in p_t:
    for t2 in com[t1]:
        denom = p_t[t1] * p_t[t2]
        pmi[t1][t2] = math.log2(p_t_com[t1][t2] / denom)

semantic_orientation = {}
for term, n in p_t.items():
    positive_assoc = sum(pmi[term][tx] for tx in positive_vocab)
    negative_assoc = sum(pmi[term][tx] for tx in negative_vocab)
    semantic_orientation[term] = positive_assoc - negative_assoc

python

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-06-28 17:21:36

defaultdict将为丢失的每个键调用工厂函数。如果在缺少大量键的sum()中使用它，则确实会创建大量字典，这些字典不需要使用就会包含更多的键。

切换到此处使用方法来防止创建对象：

positive_assoc = sum(pmi.get(term, {}).get(tx, 0) for tx in positive_vocab)
negative_assoc = sum(pmi.get(term, {}).get(tx, 0) for tx in negative_vocab)

请注意，pmi.get()调用返回一个空字典，以便链接的dict.get()调用继续工作，如果没有与给定term关联的字典，则可以返回默认的0。

票数 3

Stack Overflow用户

发布于 2015-06-28 19:07:30

我喜欢马金的回答..。但这也应该有效，您可能会发现它更易读。

positive_assoc = sum(pmi[term][tx] for tx in positive_vocab if term in pmi and tx in pmi[term) negative_assoc = sum(pmi[term][tx] for tx in negative_vocab if term in pmi and tx in pmi[term)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31102545

复制

相似问题

问defaultdict记忆效率
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问defaultdict记忆效率EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问defaultdict记忆效率
EN