文章/答案/技术大牛

发布

问在文本文件中计数单词
EN

Stack Overflow用户

提问于 2013-05-28 08:59:00

回答 5查看 1.7K关注 0票数 0

我有.txt文件(示例)：

专业人员是指从事某一活动或职业的人，以获得或补偿作为谋生手段；例如长期职业，而不是业余或消遣。由于许多专业服务具有个人和机密性质，因此有必要对这些服务给予极大的信任，因此，大多数专业人员必须遵守严格的行为守则，其中规定了严格的道德和道德义务。

如何计算“专业”这个词有多少次？(使用NLTK -是最好的选择吗？)

text_file = open("text.txt", "r+b")

python

nltk

回答 5

Stack Overflow用户

回答已采纳

发布于 2013-05-28 09:06:52

为了更好地反映你的愿望，我改变了我的答案：

from nltk import word_tokenize

with open('file_path') as f:
    content = f.read()
# we will use your text example instead:
content = "A professional is a person who is engaged in a certain activity, or occupation, for gain or compensation as means of livelihood; such as a permanent career, not as an amateur or pastime. Due to the personal and confidential nature of many professional services, and thus the necessity to place a great deal of trust in them, most professionals are subject to strict codes of conduct enshrining rigorous ethical and moral obligations."

def Count_Word(word, data):
    c = 0
    tokens = word_tokenize(data)
    for token in tokens:
        token = token.lower()
        # this plural check is dangerous, if trying to find a word that ends with an 's'
        token = token[:-1] if token[-1] == 's' else token
        if token == word:
            c += 1
    return c

print Count_Word('professional', content)
>>>
3

下面是该方法的修改版本：

def Count_Word(word, data, leading=[], trailing=["'s", "s"]):
    c = 0
    tokens = word_tokenize(data)
    for token in tokens:
        token = token.lower()
        for lead in leading:
            if token.startswith(lead):
                token = token.partition(lead)[2]
        for trail in trailing:
            if token.endswith(trail):
                token = token.rpartition(trail)[0]
        if token == word:
            c += 1
    return c

我添加了可选的参数，这些参数是单词的前导部分或尾部部分的列表，为了找到它，您需要对其进行修改.目前，我只使用默认的's或s。但是，如果你发现别人会适合你，你总是可以添加它们。如果列表开始变长，则可以将它们设置为常量。

票数 4

Stack Overflow用户

发布于 2013-05-28 09:11:32

可以在一行(加上导入)中解决：

>>> from collections import Counter
>>> Counter(w.lower() for w in open("text.txt").read().split())['professional']
2

票数 5

Stack Overflow用户

发布于 2013-05-28 09:04:16

你可以简单地标记字符串然后搜索所有的标记..。但这只是一种方式。还有很多其他..。

s = text_file.read()
tokens = nltk.word_tokenize(s)
counter = 0
for token in tokens:
  toke = token
  if token[-1] == "s":
    toke = token[0:-1]
  if toke.lower() == "professional":
    counter += 1

print counter

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/16787911

复制

相似问题

问在文本文件中计数单词
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在文本文件中计数单词EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在文本文件中计数单词
EN