我正在学习NLP课程,并试图理解书中提供的代码。我尝试在Jupyter中用Python 3运行它,得到了一个意外的结果,它是一个对象列表,而不是像书中那样过滤的标记列表。我知道作者使用的是Python2,所以这可能是原因,肯定有一些方法可以让代码在Python3中工作。我试着用list()打印它,但仍然给出了相同的结果。代码如下:
def remove_characters_after_tokenization(tokens):
pattern = re.compile('[{}]'.format(re.escape(string.punctuation)))
filtered_tokens = filter(None, [pattern.sub('', token) for token in tokens])
return filtered_tokens
filtered_list_1 = [filter(None,[remove_characters_after_tokenization(tokens) for tokens
in sentence_tokens]) for sentence_tokens in token_list]
print(filtered_list_1)
[<filter object at 0x7fb28c08fb20>, <filter object at 0x7fb28c08faf0>, <filter object at
0x7fb28c303910>]这是预期的令牌列表:
[[['The', 'brown', 'fox', 'was', 'nt', 'that', 'quick', 'and', 'he',
'could', 'nt', 'win', 'the', 'race']], [['Hey', 'that', 's', 'a', 'great',
'deal'], ['I', 'just', 'bought', 'a', 'phone', 'for', '199']], [['You',
'll', 'learn', 'a', 'lot', 'in', 'the', 'book'], ['Python', 'is', 'an',
'amazing', 'language']]]有人能帮我解决这个问题吗?我真的很感激!
发布于 2021-02-03 20:43:57
我认为你需要改变这一行;
filtered_list_1 = [filter(None,[remove_characters_after_tokenization(tokens) for tokens
in sentence_tokens]) for sentence_tokens in token_list]至:
filtered_list_1 = [list(filter(None,[remove_characters_after_tokenization(tokens) for tokens
in sentence_tokens])) for sentence_tokens in token_list]https://stackoverflow.com/questions/66027636
复制相似问题