文章/答案/技术大牛

发布

问使用NLTK的FreqDist
EN

Stack Overflow用户

提问于 2011-06-09 04:29:16

回答 2查看 10.8K关注 0票数 3

我正在尝试使用Python获取一组文档的频率分布。由于某些原因，我的代码无法工作，并产生以下错误：

Traceback (most recent call last):
  File "C:\Documents and Settings\aschein\Desktop\freqdist", line 32, in <module>
    fd = FreqDist(corpus_text)
  File "C:\Python26\lib\site-packages\nltk\probability.py", line 104, in __init__
    self.update(samples)
  File "C:\Python26\lib\site-packages\nltk\probability.py", line 472, in update
    self.inc(sample, count=count)
  File "C:\Python26\lib\site-packages\nltk\probability.py", line 120, in inc
    self[sample] = self.get(sample,0) + count
TypeError: unhashable type: 'list'

你能帮上忙吗？

这是目前为止的代码：

import os
import nltk
from nltk.probability import FreqDist


#The stop=words list
stopwords_doc = open("C:\\Documents and Settings\\aschein\\My Documents\\stopwords.txt").read()
stopwords_list = stopwords_doc.split()
stopwords = nltk.Text(stopwords_list)

corpus = []

#Directory of documents
directory = "C:\\Documents and Settings\\aschein\\My Documents\\comments"
listing = os.listdir(directory)

#Append all documents in directory into a single 'document' (list)
for doc in listing:
    doc_name = "C:\\Documents and Settings\\aschein\\My Documents\\comments\\" + doc
    input = open(doc_name).read() 
    input = input.split()
    corpus.append(input)

#Turn list into Text form for NLTK
corpus_text = nltk.Text(corpus)

#Remove stop-words
for w in corpus_text:
    if w in stopwords:
        corpus_text.remove(w)

fd = FreqDist(corpus_text)

python

frequency

nltk

frequency-distribution

回答 2

Stack Overflow用户

发布于 2011-06-09 14:45:51

我希望至少有两个想法能为答案做出贡献。

首先，nltk.text.Text()方法的文档声明(重点是我的)：

一个围绕简单(字符串)标记序列的包装器，它是，旨在支持文本的初始探索(通过交互式控制台)。它的方法对文本的上下文执行各种分析(例如，计数，一致，搭配发现)，并显示结果。如果您希望编写利用这些分析的程序，则应绕过文本类，而直接使用适当的分析函数或类。

所以我不确定Text()是不是您想要处理这些数据的方式。在我看来，使用列表就可以了。

其次，我要提醒您考虑您在这里要求NLTK执行的计算。在确定频率分布之前删除停用词意味着您的频率将会倾斜；我不明白为什么在列表之前删除停用词，而不是在事后检查分布时忽略停用词。(我认为这第二点比回答的一部分更好，但我认为值得指出的是，比例将是不正确的。)根据您打算使用频率分布的目的，这本身可能是问题，也可能不是问题。

票数 2

Stack Overflow用户

发布于 2011-06-09 04:56:01

错误提示您尝试使用列表作为散列键。你能把它转换成一个元组吗？

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/6284855

复制

相似问题

问使用NLTK的FreqDist
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用NLTK的FreqDistEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用NLTK的FreqDist
EN