文章/答案/技术大牛

发布

社区首页 >问答首页 >如何获取python或R中最常见的短语或单词

问如何获取python或R中最常见的短语或单词
EN

Stack Overflow用户

提问于 2018-03-31 16:27:52

回答 2查看 2.7K关注 0票数 0

给出一些文本，我如何才能在n=1中得到最常见的n克到6？我见过一次只取3克，或2克的方法，但有没有办法提取最有意义的最大长度短语，还有其他的方法呢？

例如，在本文中，仅用于演示目的：fri evening commute can be long. some people avoid fri evening commute by choosing off-peak hours. there are much less traffic during off-peak.

N克及其计数器的理想结果是：

fri evening commute: 3,
off-peak: 2,
rest of the words: 1

任何建议都很感激。谢谢。

python

nlp

text-mining

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-03-31 21:44:15

如果你打算使用R：https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html，我会建议你这样做的

票数 1

Stack Overflow用户

发布于 2018-03-31 21:28:51

Python

考虑NLTK库，它提供了一个ngram函数，您可以使用它来迭代n的值。

粗略的实现将遵循以下思路，其中rough是此处的关键字：

from nltk import ngrams
from collections import Counter

result = []
sentence = 'fri evening commute can be long. some people avoid fri evening commute by choosing off-peak hours. there are much less traffic during off-peak.'
# Since you are not considering periods and treats words with - as phrases
sentence = sentence.replace('.', '').replace('-', ' ')

for n in range(len(sentence.split(' ')), 1, -1):
    phrases = []

    for token in ngrams(sentence.split(), n):
        phrases.append(' '.join(token))

    phrase, freq = Counter(phrases).most_common(1)[0]
    if freq > 1:
        result.append((phrase, n))
        sentence = sentence.replace(phrase, '')

for phrase, freq in result:
    print('%s: %d' % (phrase, freq))

至于R

这可能会有帮助

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49589974

复制

相似问题

问如何获取python或R中最常见的短语或单词
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何获取python或R中最常见的短语或单词EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何获取python或R中最常见的短语或单词
EN