我不明白为什么这个不起作用:
import nltk
from nltk.corpus import stopwords
import string
with open('moby.txt', 'r') as f:
moby_raw = f.read()
stop = set(stopwords.words('english'))
moby_tokens = nltk.word_tokenize(moby_raw)
text_no_stop_words_punct = [t for t in moby_tokens if t not in stop or t not in string.punctuation]
print(text_no_stop_words_punct)看一下输出,我有这样的输出:
[...';', 'surging', 'from', 'side', 'to', 'side', ';', 'spasmodically', 'dilating', 'and', 'contracting',...]看起来标点符号还在。我哪里做错了?
发布于 2017-08-05 06:21:24
它必须是and,而不是or
if t not in stop and t not in string.punctuation或者:
if not (t in stop or t in string.punctuation):或者:
all_stops = stop | set(string.punctuation)
if t not in all_stops:后一种解决方案速度最快。
发布于 2017-08-05 06:21:06
在此行中,尝试将'or‘更改为' and’,这样列表将只返回既不是停用词也不是标点符号单词。
text_no_stop_words = [t for t in moby_tokens if t not in stop or t not in string.punctuation]发布于 2017-08-05 06:24:21
关。在您的比较中,您需要使用and而不是or。如果发现像";“这样的标点符号不在stop中,那么python就不会检查它是否在string.punctuation中。
text_no_stop_words_punct = [t for t in moby_tokens if t not in stop and t not in string.punctuation]https://stackoverflow.com/questions/45516207
复制相似问题