我搞不懂为什么这段代码不能按照我想要的方式工作。我正在读取一个txt文件,并将每个项目(以逗号分隔)打印到新行上。每一项都用"“括起来,还包含标点符号。我正在尝试删除这个标点符号。我熟悉string.punctuation,并在我的示例中使用它进行测试,但它在我循环遍历的项上失败,请参见以下内容:
def read_word_lists(path):
import string
with open(path, encoding='utf-8') as f:
lines = f.readlines()
for line in lines[0].split(','):
line = str(line)
line = line.strip().lower()
print(''.join(word.strip(string.punctuation) for word in line))
print(line)
print(''.join(word.strip(string.punctuation) for word in '"why, does this work?! and not above?"'))
read_word_lists('file.txt')结果是这样的:
trying to strip punctuation: “you never”
originial: “you never”
test: why does this work and not above
trying to strip punctuation: “you always
originial: “you always"
test: why does this work and not above
trying to strip punctuation: ” “your problem is”
originial: ” “your problem is”
test: why does this work and not above
trying to strip punctuation: “the trouble with you is”
originial: “the trouble with you is”
test: why does this work and not above有什么想法为什么“尝试剥离标点符号”的输出不起作用吗?
编辑
原始文件如下所示(如果有用):
"YOU NEVER”, “YOU ALWAYS", ” “YOUR PROBLEM IS”, “THE TROUBLE WITH YOU IS”
发布于 2020-02-12 07:32:03
您正在尝试删除unicode标点符号,而string.punctuation只包含ascii标点符号。
您可以使用以下代码来生成包含所有string.punctuation标点字符的字符串,而不是使用Unicode:
import unicodedata
import sys
punctuation = "".join((chr(i) for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P')))祝好运!
https://stackoverflow.com/questions/60173745
复制相似问题