首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在Python语言中对我的语料库使用"collocation_list“函数?

如何在Python语言中对我的语料库使用"collocation_list“函数?
EN

Stack Overflow用户
提问于 2019-10-29 15:29:31
回答 1查看 897关注 0票数 2

我是Python的新手,正在尝试导入我自己的语料库,以便在其文本中找到搭配。我使用的是Python 3.7.5。并遵循了伯德,克莱因和洛珀的教科书的指示。

然而,当我试图在整个语料库上使用"collocation_list“时,环境返回"'ConcatenatedCorpusView‘object has no attribute 'collocation_list'",而当我在单独的文本上使用它时,它是"'StreamBackedCorpusView’object has no attribute 'collocation_list'”。

我应该怎么做才能在语料库中找到搭配?

我试着调用"import nltk.collocations",但它不起作用,当然...

代码语言:javascript
复制
>>> from nltk.corpus import PlaintextCorpusReader
>>> eng_corpus_root = 'D:\Corpus\EN'
>>> eng_corpus = PlaintextCorpusReader(eng_corpus_root, '.*')
>>> eng = eng_corpus.words()

>>> eng.collocation_list()
Traceback (most recent call last):
  File "<pyshell#39>", line 1, in <module>
    eng.collocation_list()
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'collocation_list'

>>> eng1 = eng_corpus.words('CNN/2019.10.18_EN_CNN 2.txt')

>>> eng1.collocation_list()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    eng1.collocation_list()
AttributeError: 'StreamBackedCorpusView' object has no attribute 'collocation_list'

如果我能得到这样的结果就太好了(上面提到的教科书中的一个例子)。

代码语言:javascript
复制
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

>>> text4.collocation_list()
['United States', 'fellow citizens', 'four years', 'years ago', 'Federal Government', 'General Government', 'American people', 'Vice President', 'God bless', 'Chief Justice', 'Old World', 'Almighty God', 'Fellow citizens', 'Chief Magistrate', 'every citizen', 'one another', 'fellow Americans', 'Indian tribes', 'public debt', 'foreign nations']

会非常感谢你的帮助。

EN

回答 1

Stack Overflow用户

发布于 2019-10-29 23:59:32

问题解决了。我需要初始化我的语料库(参见:http://www.nltk.org/api/nltk.html#nltk.text.Text)

代码语言:javascript
复制
>>> from nltk.text import Text
>>> text458 = Text(eng_corpus.words())
>>> text458.collocation_list()
['Hong Kong', 'United States', 'Getty Images', 'European Union', 'Northern Ireland', 'Boris Johnson', 'Prime Minister', 'Islamic State', 'Extinction Rebellion', 'Cape Dorset', 'extradition bill', 'Recep Tayyip', 'HONG KONG', 'Mike Pence', 'New York', 'Tayyip Erdogan', 'Democratic Forces', 'Vice President', 'Anthony Kwan', 'Kurdish fighters']

就这么简单。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58602991

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档