我的nltk数据是~/nltk_data/corpora/words/(en,en-basic,README)
根据__init__.py inside ~/lib/python2.7/site-packages/nltk/corpus,要阅读布朗语料库中的单词列表,请使用nltk.corpus.brown.words()
from nltk.corpus import brown
print brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]这个__init__.py
words = LazyCorpusLoader(
'words', WordListCorpusReader, r'(?!README|\.).*')from nltk.corpus import words时,是否从驻留在python2.7/site-packages/nltk/corpus目录中的__init__.py导入'words‘函数?~/nltk_data/corpora (而不是nltk/语料库中)。那么为什么这个命令能工作呢?
从nltk.corpus进口棕色
不是应该是这个吗?
从nltk_data.corpora进口棕色发布于 2013-08-27 13:20:52
Re.要点2:您可以导入模块(import module.submodule),也可以从模块(from module.submodule import variable)导入对象。虽然您可以将模块视为变量,因为它实际上是该作用域(from module import submodule)中的一个变量,但它不会以另一种方式工作。这就是为什么当您尝试执行import module.submodule.variable时,它失败了。
Re.第3点:取决于nltk.corpus做什么。也许它会自动为您搜索/加载nltk_data。
发布于 2013-08-27 13:00:09
1.]是的--通过使用来自util的LazyCorpusLoader,您可以找到以下描述:
"""
A proxy object which is used to stand in for a corpus object
before the corpus is loaded. This allows NLTK to create an object
for each corpus, but defer the costs associated with loading those
corpora until the first time that they're actually accessed.
The first time this object is accessed in any way, it will load
the corresponding corpus, and transform itself into that corpus
(by modifying its own ``__class__`` and ``__dict__`` attributes).
If the corpus can not be found, then accessing this object will
raise an exception, displaying installation instructions for the
NLTK data package. Once they've properly installed the data
package (or modified ``nltk.data.path`` to point to its location),
they can then use the corpus object without restarting python.
"""3.] nltk_data是数据所在的文件夹,这并不意味着模块也在该文件夹中(数据是从数据下载的)
NLTK公司为数十家公司提供了内置的支持,并培训了模型,如下所示。要在NLTK中使用这些,我们建议您使用NLTK语料库下载器>>> nltk.download()
https://stackoverflow.com/questions/18465660
复制相似问题