我在从.txt文件中读取行时遇到了问题。我的文件包含了一些句子,比如
没有,不能,没有
以此类推,问题是当我在
的
我有这样的事情:
-欧元™
所以我读的单词是hadn’t而不是hadn’t
我的意见:
Love at First Sight
One <adjective> afternoon, I was walking by the <place> when
accidentally I bumped into a <adjective> boy.
At first I blushed and apologized for bumping into him, but when he flashed his
<adjective> smile I just couldn’t help falling in love. His
<adjective> voice telling me that it was ok sounded like music to myears.
I could have stayed there staring at him for <period_of_time>.
He had <adjective> <color> eyes and <adjective>
<color> hair. I thought he was perfect for me. Before I noticed,
<number> <period_of_time> had passed by after I apologized,
and I hadn’t said anything else since!
That’s when I noticed thathe was looking at me
<adverb>. I didn’t know what tosay, so I just <past_verb>.
I noticed him giving me astrange look when he started walking to his
<noun>.I looked back at him <number> more time(s), but hewas already out of sight.
It wasn’t love after all预期输出:与输入文件相同
我的代码:
f = open('loveatfirstsight.txt','r')
for i in f.readlines():
print(i)我的操作系统: Windows 10
发布于 2021-01-15 12:21:43
该文件是用UTF-8编码的,但是您正在读取它,就好像它是(我猜想) windows-1252 (或其他一些特定于Windows的编码)一样。由于该文件中出现的撇号字符不是典型的ASCII‘打字机撇号’(' U+0027 APOSTROPHE),而是位于基本拉丁文(‘ASCII’)块之外的“排印者的撇号”(’ U+2019 RIGHT单引号),因此,不匹配的编码会使字符出现故障。
>>> 'hadn’t'.encode('utf-8').decode('cp1252')
'hadn’t'要纠正这个问题,您应该通过encoding参数将正确的编码指定给open函数。
f = open('loveatfirstsight.txt', 'r', encoding='utf-8')
for i in f.readlines():
print(i)正如help(open)所解释的,
在文本模式下,如果未指定
encoding,则使用的编码依赖于平台:调用locale.getpreferredencoding(False)以获取当前的区域编码。(对于读取和写入原始字节,使用二进制模式并保留encoding未指定。)
。
发布于 2021-01-15 12:11:01
这听起来像是编码问题。文本文件存储在UTF-8中,其中包含卷曲引号。您要么用错误的编码(可能是拉丁文-1)读取它,要么将它输出到某个地方(可能是Windows控制台?)这不是预期的UTF-8编码。
如果将问题修改为包含更多关于数据如何准确存储、读取和处理的详细信息,包括您所使用的系统以及使用的Python版本等,您将能够得到更好的答案。
https://stackoverflow.com/questions/65735782
复制相似问题