我正在做的工作和预期的输出是
(‘粉丝’,3),(‘汽车’,3),(‘学科’,1)
“跑车”,“运动迷”
我的代码如下。我能够获得第一个预期输出,但不能正确获得第二个输出。有谁能帮我出什么问题吗?
from nltk.tokenize import RegexpTokenizer
text='Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.'
word='sports'
tokenizedword = nltk.tokenize.regexp_tokenize(text, pattern = '\w*', gaps = False)
#Step 2
tokenizedwords = [x.lower() for x in tokenizedword if x != '']
tokenizedwordsbigram=list(nltk.bigrams(tokenizedwords))
stop_words = set(stopwords.words('english'))
filteredwords = []
for x in tokenizedwordsbigram:
if x not in stop_words:
filteredwords.append(x)
tokenizednonstopwordsbigram = nltk.ConditionalFreqDist(filteredwords)
print(tokenizednonstopwordsbigram[word].most_common(3))
gen_text=nltk.Text(tokenizedwords)
print(gen_text.collocations())发布于 2020-09-02 10:47:23
我通过添加所需的导入nltk import和from nltk.corpus import stopwords运行代码,并获得以下输出。
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
# use to find bigrams, which are pairs of words
text = \
'Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.'
word = 'sports'
tokenizedword = nltk.tokenize.regexp_tokenize(text, pattern='\w*',
gaps=False)
# Step 2
tokenizedwords = [x.lower() for x in tokenizedword if x != '']
tokenizedwordsbigram = list(nltk.bigrams(tokenizedwords))
stop_words = set(stopwords.words('english'))
filteredwords = []
for x in tokenizedwordsbigram:
if x not in stop_words:
filteredwords.append(x)
tokenizednonstopwordsbigram = nltk.ConditionalFreqDist(filteredwords)
print tokenizednonstopwordsbigram[word].most_common(3)
gen_text = nltk.Text(tokenizedwords)
print gen_text.collocations()下面是输出:
[('car', 3), ('fans', 3), ('disciplines', 1)]
sports car; sports fans
None发布于 2020-10-02 18:55:48
替换
print(gen_text.collocations())使用
print(gen_text.collocation_list())你的程序会运行的很好
发布于 2021-08-23 14:46:43
gen_text =nltk.Text(标记化的单词).collocation_list()
B=[gen_text中i的i+“"+i1 ]
返回b
您将输出如下:
“跑车”,“运动迷”
https://stackoverflow.com/questions/63696936
复制相似问题