文章/答案/技术大牛

发布

社区首页 >问答首页 >NLP生成集合

问NLP生成集合
EN

Stack Overflow用户

提问于 2020-09-02 08:00:57

回答 4查看 497关注 0票数 0

我正在做的工作和预期的输出是

(‘粉丝’，3)，(‘汽车’，3)，(‘学科’，1)

“跑车”，“运动迷”

我的代码如下。我能够获得第一个预期输出，但不能正确获得第二个输出。有谁能帮我出什么问题吗？

    from nltk.tokenize import RegexpTokenizer
    text='Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.'
    word='sports'
    tokenizedword = nltk.tokenize.regexp_tokenize(text, pattern = '\w*', gaps = False)
    #Step 2
    tokenizedwords = [x.lower() for x in tokenizedword if x != '']

    tokenizedwordsbigram=list(nltk.bigrams(tokenizedwords))
    stop_words = set(stopwords.words('english')) 
    filteredwords = []
    for x in tokenizedwordsbigram:
       if x not in stop_words:
          filteredwords.append(x)
     
    tokenizednonstopwordsbigram = nltk.ConditionalFreqDist(filteredwords)  
    print(tokenizednonstopwordsbigram[word].most_common(3))
    gen_text=nltk.Text(tokenizedwords)
    print(gen_text.collocations())

nlp

回答 4

Stack Overflow用户

发布于 2020-09-02 10:47:23

我通过添加所需的导入nltk import和from nltk.corpus import stopwords运行代码，并获得以下输出。

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer

# use to find bigrams, which are pairs of words

text = \
    'Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.'
word = 'sports'
tokenizedword = nltk.tokenize.regexp_tokenize(text, pattern='\w*',
        gaps=False)

# Step 2
tokenizedwords = [x.lower() for x in tokenizedword if x != '']

tokenizedwordsbigram = list(nltk.bigrams(tokenizedwords))
stop_words = set(stopwords.words('english'))
filteredwords = []

for x in tokenizedwordsbigram:
    if x not in stop_words:
        filteredwords.append(x)

tokenizednonstopwordsbigram = nltk.ConditionalFreqDist(filteredwords)
print tokenizednonstopwordsbigram[word].most_common(3)

gen_text = nltk.Text(tokenizedwords)
print gen_text.collocations()

下面是输出：

[('car', 3), ('fans', 3), ('disciplines', 1)]
sports car; sports fans
None

票数 0

Stack Overflow用户

发布于 2020-10-02 18:55:48

替换

print(gen_text.collocations())

使用

print(gen_text.collocation_list())

你的程序会运行的很好

票数 0

Stack Overflow用户

发布于 2021-08-23 14:46:43

gen_text =nltk.Text(标记化的单词).collocation_list()

B=[gen_text中i的i+“"+i1 ]

返回b

您将输出如下：

“跑车”，“运动迷”

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63696936

复制

相似问题

问NLP生成集合
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NLP生成集合EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NLP生成集合
EN