首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >NLP生成集合

NLP生成集合
EN

Stack Overflow用户
提问于 2020-09-02 08:00:57
回答 4查看 497关注 0票数 0

我正在做的工作和预期的输出是

(‘粉丝’,3),(‘汽车’,3),(‘学科’,1)

“跑车”,“运动迷”

我的代码如下。我能够获得第一个预期输出,但不能正确获得第二个输出。有谁能帮我出什么问题吗?

代码语言:javascript
复制
    from nltk.tokenize import RegexpTokenizer
    text='Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.'
    word='sports'
    tokenizedword = nltk.tokenize.regexp_tokenize(text, pattern = '\w*', gaps = False)
    #Step 2
    tokenizedwords = [x.lower() for x in tokenizedword if x != '']

    tokenizedwordsbigram=list(nltk.bigrams(tokenizedwords))
    stop_words = set(stopwords.words('english')) 
    filteredwords = []
    for x in tokenizedwordsbigram:
       if x not in stop_words:
          filteredwords.append(x)
     
    tokenizednonstopwordsbigram = nltk.ConditionalFreqDist(filteredwords)  
    print(tokenizednonstopwordsbigram[word].most_common(3))
    gen_text=nltk.Text(tokenizedwords)
    print(gen_text.collocations())
EN

回答 4

Stack Overflow用户

发布于 2020-09-02 10:47:23

我通过添加所需的导入nltk importfrom nltk.corpus import stopwords运行代码,并获得以下输出。

代码语言:javascript
复制
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer

# use to find bigrams, which are pairs of words

text = \
    'Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.'
word = 'sports'
tokenizedword = nltk.tokenize.regexp_tokenize(text, pattern='\w*',
        gaps=False)

# Step 2
tokenizedwords = [x.lower() for x in tokenizedword if x != '']

tokenizedwordsbigram = list(nltk.bigrams(tokenizedwords))
stop_words = set(stopwords.words('english'))
filteredwords = []

for x in tokenizedwordsbigram:
    if x not in stop_words:
        filteredwords.append(x)

tokenizednonstopwordsbigram = nltk.ConditionalFreqDist(filteredwords)
print tokenizednonstopwordsbigram[word].most_common(3)

gen_text = nltk.Text(tokenizedwords)
print gen_text.collocations()

下面是输出:

代码语言:javascript
复制
[('car', 3), ('fans', 3), ('disciplines', 1)]
sports car; sports fans
None
票数 0
EN

Stack Overflow用户

发布于 2020-10-02 18:55:48

替换

代码语言:javascript
复制
print(gen_text.collocations())

使用

代码语言:javascript
复制
print(gen_text.collocation_list())

你的程序会运行的很好

票数 0
EN

Stack Overflow用户

发布于 2021-08-23 14:46:43

gen_text =nltk.Text(标记化的单词).collocation_list()

B=[gen_text中i的i+“"+i1 ]

返回b

您将输出如下:

“跑车”,“运动迷”

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63696936

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档