问导入和使用NLTK语料库
EN

Stack Overflow用户

提问于 2014-09-28 20:44:54

回答 1查看 2.4K关注 0票数 3

拜托，拜托，帮帮忙。我有一个包含文本文件的文件夹，我想使用NLTK进行分析。如何将其作为语料库导入，然后在其上运行NLTK命令？我已经将下面的代码组合在一起，但是它给了我一个错误：

    raise error, v # invalid expression
sre_constants.error: nothing to repeat

代码：

import nltk
import re
from nltk.corpus.reader.plaintext import PlaintextCorpusReader

corpus_root = '/Users/jt/Documents/Python/CRspeeches'
speeches = PlaintextCorpusReader(corpus_root, '*.txt')

print "Finished importing corpus" 

words = FreqDist()

for sentence in speeches.sents():
    for word in sentence:
        words.inc(word.lower())

print words["he"]
print words.freq("he")

python

nltk

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-09-28 22:10:16

我理解这个问题与已知的错误有关(也许它是一个特性？)，这一点在这个答案中有部分解释。简而言之，某些关于空洞事物的规则被炸毁了。

错误的来源是您的speeches =行。您应该将其更改为：

speeches = PlaintextCorpusReader(corpus_root, r'.*\.txt')

然后，所有的东西都会加载并编译得很好。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26089483

复制

相似问题

问导入和使用NLTK语料库
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问导入和使用NLTK语料库EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问导入和使用NLTK语料库
EN