首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >怎样才能得到不超过三个n-gram呢?

怎样才能得到不超过三个n-gram呢?
EN

Stack Overflow用户
提问于 2021-08-05 09:42:58
回答 1查看 26关注 0票数 0

我正在使用下面的函数查找n-gram。

代码语言:javascript
复制
from nltk.util import ngrams

booksAfterRemovingStopWords = ['Zombies and Calculus by Colin Adams', 'Zone to Win: Organizing to Compete in an Age of Disruption', 'Zig Zag: The Surprising Path to Greater Creativity']
booksWithNGrams = list()

for line_no, line in enumerate(booksAfterRemovingStopWords):
    tokens = line.split(" ")
    output = list(ngrams(tokens, 3))
    temp = list()
    for x in output:  # Adding n-grams
        temp.append(' '.join(x))
    booksWithNGrams.append(temp)

print(booksWithNGrams)

输出如下所示:

代码语言:javascript
复制
[['Zombies and Calculus', 'and Calculus by', 'Calculus by Colin', 'by Colin Adams'], ['Zone to Win:', 'to Win: Organizing', 'Win: Organizing to', 'Organizing to Compete', 'to Compete in', 'Compete in an', 'in an Age', 'an Age of', 'Age of Disruption'], ['Zig Zag: The', 'Zag: The Surprising', 'The Surprising Path', 'Surprising Path to', 'Path to Greater', 'to Greater Creativity']]

但是,我不想要更多的三个n-gram。我的意思是我希望输出是这样的:

代码语言:javascript
复制
[['Zombies and Calculus', 'and Calculus by', 'Calculus by Colin'], ['Zone to Win:', 'to Win: Organizing', 'Win: Organizing to'], ['Zig Zag: The', 'Zag: The Surprising', 'The Surprising Path']]

我怎样才能做到这一点呢?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-08-05 09:56:55

这就是你要做的:

逻辑:只需在循环中数到三,然后在i>2上中断(计数i=0,1,2和break )。

代码语言:javascript
复制
booksAfterRemovingStopWords = ['Zombies and Calculus by Colin Adams', 'Zone to Win: Organizing to Compete in an Age of Disruption', 'Zig Zag: The Surprising Path to Greater Creativity']
booksWithNGrams = list()
        
for line_no, line in enumerate(booksAfterRemovingStopWords):
    tokens = line.split(" ")
    output = list(ngrams(tokens, 3))
    temp = list()
    for i,x in enumerate(output):# Adding n-grams
        if i>2:
            break
        temp.append(' '.join(x))
    booksWithNGrams.append(temp)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68664231

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档