我试图从语料库中删除较长(>25个标记)和较短(<4个标记)的句子,并删除包含出现次数少于8次的罕见单词的句子。我试图删除它,但每次尝试都会收到错误消息或空列表。语料库是棕色语料库。
lens = [w for w in corpus.sents() if len(w)>=25 and len(w)<= 4]我得到空列表作为输出
out: []我也不知道如何在这份清单中加入稀有单词的理解。我必须转换成FreqDist吗??
如何删除非常长,很短,有罕见单词的句子?我很困惑。有人知道并能解释怎么做吗??我们将非常感谢:)
发布于 2021-03-07 14:44:27
您可以这样做,只保留长度为less than 26和长度为more than 3的单词。
a = ["hello world", "how are you doing","where are you going?", "welcome to the greatest show on earth! How will you manage to gain all the experience needed for this to show?","hi"]
[len(w) for w in a]
>>>[11, 17, 20, 110,2]方法1:
list(filter(lambda x: 4 <= len(x) <= 25, a))
>>>['hello world', 'how are you doing', 'where are you going?']方法2:
[x for x in a if 4 <= len(x) <= 25]
>>>['hello world', 'how are you doing', 'where are you going?']https://stackoverflow.com/questions/66516574
复制相似问题