blocks|key|1344439|text|我通过将.encode("utf-8")添加到soup来修正它。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1344440|这意味着print(soup)变成了print(soup.encode("utf-8"))。|1344441|entityMap^0|4|G|N|4|0|4|B|I|R|0^^$0|@$1|2|3|4|5|6|7|J|8|@$9|K|A|L|B|C]|$9|M|A|N|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|O|8|@$9|P|A|Q|B|C]|$9|R|A|S|B|C]]|D|@]|E|$]]|$1|H|3|-4|5|6|7|T|8|@]|D|@]|E|$]]]|I|$]]

I fixed it by adding <code>.encode("utf-8")</code> to <code>soup</code>.

That means that <code>print(soup)</code> becomes <code>print(soup.encode("utf-8"))</code>.

blocks|key|1128344|text|当我将被刮掉的web内容保存到一个文件中时，我得到了相同的UnicodeEncodeError。为了修复它，我替换了以下代码：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1128345|with+open(fname,+"w")+as+f:
++++f.write(html)|code-block|syntax|javascript|1128346|在这方面：|1128347|with+open(fname,+"w",+encoding="utf-8")+as+f:
++++f.write(html)|1128348|如果您需要支持Python+2，请使用以下命令：|1128349|import+io
with+io.open(fname,+"w",+encoding="utf-8")+as+f:
++++f.write(html)|1128350|如果要使用与UTF-8不同的编码，请指定encoding的实际编码。|1128351|entityMap^0|T|I|0|0|0|0|0|0|K|8|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|Z|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|10|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|11|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|12|8|@]|D|@]|E|$]]|$1|Q|3|R|5|H|7|13|8|@]|D|@]|E|$I|J]]|$1|S|3|T|5|6|7|14|8|@$9|15|A|16|B|C]]|D|@]|E|$]]|$1|U|3|-4|5|6|7|17|8|@]|D|@]|E|$]]]|V|$]]

I was getting the same <code>UnicodeEncodeError</code> when saving scraped web content to a file. To fix it I replaced this code:
<pre><code>with open(fname, &quot;w&quot;) as f:
 f.write(html)
</code></pre>
with this:
<pre><code>with open(fname, &quot;w&quot;, encoding=&quot;utf-8&quot;) as f:
 f.write(html)
</code></pre>
If you need to support Python 2, then use this:
<pre><code>import io
with io.open(fname, &quot;w&quot;, encoding=&quot;utf-8&quot;) as f:
 f.write(html)
</code></pre>
If you want to use a different encoding than UTF-8, specify whatever your actual encoding is for <code>encoding</code>.

blocks|key|1344559|text|在Python3.7中，运行Windows+10起作用(我不确定它是否能在其他平台和/或其他版本的Python上工作)|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1344560|取代这一行：|1344561|with+open('filename',+'w')+as+f:|offset|length|style|CODE|1344562|在这方面：|1344563|with+open('filename',+'w',+encoding='utf-8')+as+f:|1344564|它工作的原因是因为在使用文件时编码被更改为UTF-8，所以UTF-8中的字符可以转换为文本，而不是在遇到当前编码不支持的UTF-8字符时返回错误。|1344565|entityMap^0|0|0|0|W|0|0|0|1E|0|0^^$0|@$1|2|3|4|5|6|7|R|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|S|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|T|8|@$F|U|G|V|H|I]]|9|@]|A|$]]|$1|J|3|K|5|6|7|W|8|@]|9|@]|A|$]]|$1|L|3|M|5|6|7|X|8|@$F|Y|G|Z|H|I]]|9|@]|A|$]]|$1|N|3|O|5|6|7|10|8|@]|9|@]|A|$]]|$1|P|3|-4|5|6|7|11|8|@]|9|@]|A|$]]]|Q|$]]

In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)

Replacing this line:

<code>with open('filename', 'w') as f:</code>

With this:

<code>with open('filename', 'w', encoding='utf-8') as f:</code>

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.

blocks|key|2581459|text|set+PYTHONIOENCODING=utf-8
set+PYTHONLEGACYWINDOWSSTDIO=utf-8|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|2581460|您可能需要也可能不需要设置第二个环境变量PYTHONLEGACYWINDOWSSTDIO。|unstyled|offset|length|style|CODE|2581461|或者，可以在代码中这样做(尽管建议通过env这样做)：|2581462|sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')|2581463|此外：复制这个错误有点痛苦，因此，如果您需要在您的机器上复制它，请将它留在这里：|BOLD|2581464|set+PYTHONIOENCODING=windows-1252
set+PYTHONLEGACYWINDOWSSTDIO=windows-1252|2581465|entityMap|0|LINK|mutability|MUTABLE|url|https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING^0|0|K|O|K|O|0|0|0|0|3|6|0|0^^$0|@$1|2|3|4|5|6|7|11|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|12|8|@$G|13|H|14|I|J]]|9|@$G|15|H|16|1|17]]|A|$]]|$1|K|3|L|5|F|7|18|8|@]|9|@]|A|$]]|$1|M|3|N|5|6|7|19|8|@]|9|@]|A|$B|C]]|$1|O|3|P|5|F|7|1A|8|@$G|1B|H|1C|I|Q]]|9|@]|A|$]]|$1|R|3|S|5|6|7|1D|8|@]|9|@]|A|$B|C]]|$1|T|3|-4|5|F|7|1E|8|@]|9|@]|A|$]]]|U|$V|$5|W|X|Y|A|$Z|10]]]]

<pre><code>set PYTHONIOENCODING=utf-8
set PYTHONLEGACYWINDOWSSTDIO=utf-8
</code></pre>
You may or may not need to set that second environment variable <a href="https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING" rel="noreferrer"><code>PYTHONLEGACYWINDOWSSTDIO</code></a>.
Alternatively, this can be done in code (although it seems that doing it through env vars is recommended):
<pre><code>sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
</code></pre>
<hr />
Additionally: Reproducing this error was a bit of a pain, so leaving this here too in case you need to reproduce it on your machine:
<pre><code>set PYTHONIOENCODING=windows-1252
set PYTHONLEGACYWINDOWSSTDIO=windows-1252
</code></pre>

blocks|key|2581355|text|在保存get请求的响应时，在窗口10的Python3.7上抛出了相同的错误。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2581356|import+requests
resp+=+requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with+open+('NiftyList.txt',+'w')+as+f:
++++f.write(resp.text)|code-block|syntax|javascript|2581357|当我使用open命令添加encoding="utf-8“时，它用正确的响应保存了文件。|2581358|with+open+('NiftyList.txt',+'w',+encoding="utf-8")+as+f:
++++f.write(resp.text)|2581359|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production

<pre><code>import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
 f.write(resp.text)
</code></pre>

When I added encoding="utf-8" with the open command it saved the file with the correct response 

<pre><code>with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
 f.write(resp.text)
</code></pre>

blocks|key|1344632|text|甚至当您尝试打印、读/写或打开编码时，我也会遇到同样的问题。正如上面提到的，添加.encoding="utf-8“将有帮助，如果您试图打印它。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1344633|soup.encode("utf-8")|blockquote|1344634|如果您试图打开已刮过的数据并将其写入文件，则使用(.，encoding="utf-8")打开该文件。|1344635|对于open(filename_csv，'w'，newline=''，encoding="utf-8")作为csv_file：|1344636|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$]]|$1|E|3|F|5|6|7|M|8|@]|9|@]|A|$]]|$1|G|3|H|5|D|7|N|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|O|8|@]|9|@]|A|$]]]|J|$]]

Even I faced the same issue with the encoding that occurs when you try to print it, read/write it or open it. As others mentioned above adding .encoding="utf-8" will help if you are trying to print it. 

<blockquote>
 soup.encode("utf-8")
</blockquote>

If you are trying to open scraped data and maybe write it into a file, then open the file with (......,encoding="utf-8")

<blockquote>
 with open(filename_csv , 'w', newline='',encoding="utf-8") as csv_file:
</blockquote>

blocks|key|2581227|text|对于那些仍然收到此错误的人，将encode("utf-8")添加到soup中也会修复这个错误。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2581228|soup+=+BeautifulSoup(html_doc,+'html.parser').encode("utf-8")
print(soup)|code-block|syntax|javascript|2581229|entityMap^0|F|F|X|4|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|S|8|@]|D|@]|E|$]]]|L|$]]

For those still getting this error, adding <code>encode("utf-8")</code> to <code>soup</code> will also fix this.

<pre><code>soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
print(soup)
</code></pre>

blocks|key|1344735|text|这个问题有许多方面。最基本的问题是你想要输出到哪个字符集。您还可能需要计算输入字符集。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1344736|将print或write打印到具有显式encoding="..."的文件中将把encoding="..."的内部Unicode表示转换为编码。如果输出包含该编码不支持的字符，您将得到一个UnicodeEncodeError。例如，你不能写俄语，中文，印度语，希伯来语，阿拉伯语，表情符号或者.除了将一些200%2B西部字符限制到编码为"cp1252"的文件之外，任何东西都不能表示这些字符，因为该有限的8位字符集无法表示这些字符。|offset|length|style|CODE|1344737|基本上，任何8位字符集都会出现相同的问题，包括几乎所有遗留的Windows代码页(437、850、1250、1251等)，尽管其中一些页面除了英语之外还支持一些额外的脚本(例如，1251支持西里尔语，这样你就可以编写俄语、乌克兰语、塞尔维亚语、保加利亚语等)。8位编码最多只有256个字符码，无法表示不在其中的字符。|1344738|也许现在是读乔尔·斯波斯基的每个软件开发人员绝对、积极的绝对最低限度必须了解Unicode和字符集(没有借口！)的好时机|1344739|在终端无法打印Unicode的平台上(目前只有Windows才能打印Unicode，尽管如果您正在进行追溯计算，这个问题在上个千年的其他平台上也很普遍)，试图使用print+Unicode字符串也会产生这个错误，或者输出莫吉贝克。如果您看到的是HÃ©llÃ¶而不是Héllö，这是您的问题。|1344740|总之，你需要知道：|1344741|你刮过的页面的字符集是什么，或者你收到的数据是什么？刮的对不对？发端人是否正确地识别了其编码，或者您是否能够以其他方式获得此信息(或猜测)？有些网站不正确地声明一个与页面实际包含的字符集不同的字符集，有些网站错误地配置了web服务器和后端数据库之间的连接。有关更详细的示例和一些解决方案，请参见使用正确的字符编码(python请求%2B漂亮汤)。|unordered-list-item|1344742|你想写的字符集是什么？如果打印到屏幕上，您的终端是否配置正确，您的Python解释器配置是否相同？也许也见如何在windows控制台中显示utf-8|1344743|如果你在这里，这些问题之一的答案可能不是"UTF-8“。虽然以前的标准是ISO-8859-1+(又称拉丁语-1)，但最近Windows代码页1252也越来越流行。|1344744|接下来，您基本上希望您的所有文本数据都是Unicode，在一些附带的用例之外。一般来说，这意味着UTF-8，虽然在Windows上(或者如果您需要Java兼容性)，UTF-16也是可行的，尽管有点麻烦。(还有其他几种Unicode序列化格式，它们在特殊情况下可能很有用。从技术上讲，UTF-32非常简单，但占用的内存要多得多；UTF-7用于一些传输需要7位ASCII的网络协议中。)也许也见https://utf8everywhere.org/|1344745|当然，如果您要打印到一个文件，您还需要检查该文件使用一个工具，可以正确地显示它。常见的引导错误是使用只显示当前选定的系统编码的工具打开文件，或者使用试图猜测编码的工具打开文件，但猜错了。同样，当使用WindowsCodeP页1252查看UTF-8文本时，一个常见的症状将导致Héllö显示为HÃ©llÃ¶。|1344746|如果字符数据的编码是未知的，就没有简单的方法来自动建立它。如果您知道文本应该表示什么，您也许可以推断出来，但这通常是一个手动过程，涉及到一些猜测。(像chardet和ftfy这样的自动工具可能会有所帮助，但有时也会出错。)|1344747|要确定您正在查看的编码，如果您能够识别字符中没有正确显示的单个字节，则会很有帮助。例如，如果您正在查看H\x8ell\x9a，但希望它表示Héllö，则可以在翻译表中查找字节。我已经在https://tripleee.github.io/8bit上发布了一个这样的表，在这里您可以看到，在这个示例中，它可能是遗留的Mac+8位字符集之一；如果有更多的数据点，也许可以将其缩小到其中的一个(如果不是，实际上任何一个都可以，因为您关心的所有代码点都映射到相同的Unicode字符)。|1344748|对于所有输入和输出，大多数平台上的Python+3默认为UTF-8，但在Windows上，情况通常并非如此。然后，它将默认为系统的默认编码(在某些Microsoft文档中仍然被错误地称为"ANSI代码页“)，这取决于许多因素。在西方系统中，默认的开箱即用编码通常是Windows代码页1252。(早期的Python版本有一些不同的期望，在Python+2中，内部字符串表示形式不是Unicode。)|1344749|如果您在Windows上并将UTF-8写入文本文件，则可以指定encoding="utf-8-sig"，它在文件的开头添加一个BOM序列。严格地说，这并不是必要或正确的，但是一些Windows工具需要它来正确识别编码。|1344750|前面的几个答案建议盲目地应用一些编码，但希望这能帮助您理解为什么这通常不是正确的方法，以及如何确定--而不是猜测--使用哪种编码。|1344751|entityMap|0|LINK|mutability|MUTABLE|url|https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/|1|https://en.wikipedia.org/wiki/Mojibake|2|https://stackoverflow.com/questions/46253288/scrape-with-correct-character-encoding-python-requests-beautifulsoup|3|https://stackoverflow.com/questions/3578685/how-to-display-utf-8-in-windows-console|4|https://utf8everywhere.org/|5|https://pypi.org/project/chardet/|6|https://github.com/LuminosoInsight/python-ftfy|7|https://tripleee.github.io/8bit^0|0|1|5|7|5|J|E|13|E|2L|I|4L|8|0|0|E|16|0|0|29|5|3E|7|3O|5|32|4|1|0|0|43|N|2|0|1H|L|3|0|0|5F|R|4|0|3T|5|41|7|0|23|7|2B|4|23|7|5|2B|4|6|0|1F|B|1X|5|2K|V|7|0|0|V|K|0|0^^$0|@$1|2|3|4|5|6|7|1W|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|1X|8|@$D|1Y|E|1Z|F|G]|$D|20|E|21|F|G]|$D|22|E|23|F|G]|$D|24|E|25|F|G]|$D|26|E|27|F|G]|$D|28|E|29|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|2A|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|2B|8|@]|9|@$D|2C|E|2D|1|2E]]|A|$]]|$1|L|3|M|5|6|7|2F|8|@$D|2G|E|2H|F|G]|$D|2I|E|2J|F|G]|$D|2K|E|2L|F|G]]|9|@$D|2M|E|2N|1|2O]]|A|$]]|$1|N|3|O|5|6|7|2P|8|@]|9|@]|A|$]]|$1|P|3|Q|5|R|7|2Q|8|@]|9|@$D|2R|E|2S|1|2T]]|A|$]]|$1|S|3|T|5|R|7|2U|8|@]|9|@$D|2V|E|2W|1|2X]]|A|$]]|$1|U|3|V|5|6|7|2Y|8|@]|9|@]|A|$]]|$1|W|3|X|5|6|7|2Z|8|@]|9|@$D|30|E|31|1|32]]|A|$]]|$1|Y|3|Z|5|6|7|33|8|@$D|34|E|35|F|G]|$D|36|E|37|F|G]]|9|@]|A|$]]|$1|10|3|11|5|6|7|38|8|@$D|39|E|3A|F|G]|$D|3B|E|3C|F|G]]|9|@$D|3D|E|3E|1|3F]|$D|3G|E|3H|1|3I]]|A|$]]|$1|12|3|13|5|6|7|3J|8|@$D|3K|E|3L|F|G]|$D|3M|E|3N|F|G]]|9|@$D|3O|E|3P|1|3Q]]|A|$]]|$1|14|3|15|5|6|7|3R|8|@]|9|@]|A|$]]|$1|16|3|17|5|6|7|3S|8|@$D|3T|E|3U|F|G]]|9|@]|A|$]]|$1|18|3|19|5|6|7|3V|8|@]|9|@]|A|$]]|$1|1A|3|-4|5|6|7|3W|8|@]|9|@]|A|$]]]|1B|$1C|$5|1D|1E|1F|A|$1G|1H]]|1I|$5|1D|1E|1F|A|$1G|1J]]|1K|$5|1D|1E|1F|A|$1G|1L]]|1M|$5|1D|1E|1F|A|$1G|1N]]|1O|$5|1D|1E|1F|A|$1G|1P]]|1Q|$5|1D|1E|1F|A|$1G|1R]]|1S|$5|1D|1E|1F|A|$1G|1T]]|1U|$5|1D|1E|1F|A|$1G|1V]]]]

There are multiple aspects to this problem. The fundamental question is which character set you want to output into. You may also have to figure out the input character set.
Printing (with either <code>print</code> or <code>write</code>) into a file with an explicit <code>encoding=&quot;...&quot;</code> will translate Python's internal Unicode representation into that encoding. If the output contains characters which are not supported by that encoding, you will get an <code>UnicodeEncodeError</code>. For example, you can't write Russian or Chinese or Indic or Hebrew or Arabic or emoji or ... anything except a restricted set of some 200+ Western characters to a file whose encoding is <code>&quot;cp1252&quot;</code> because this limited 8-bit character set has no way to represent these characters.
Basically the same problem will occur with any 8-bit character set, including nearly all the legacy Windows code pages (437, 850, 1250, 1251, etc etc), though some of them support some additional script in addition to or instead of English (1251 supports Cyrillic, for example, so you can write Russian, Ukrainian, Serbian, Bulgarian, etc). An 8-bit encoding has only a maximum of 256 character codes and no way to represent a character which isn't among them.
Perhaps now would be a good time to read Joel Spolsky's <a href="https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/" rel="nofollow noreferrer">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a>
On platforms where the terminal is not capable of printing Unicode (only Windows these days really, though if you're into retrocomputing, this problem was also prevalent on other platforms in the previous millennium) attempting to <code>print</code> Unicode strings can also produce this error, or output <a href="https://en.wikipedia.org/wiki/Mojibake" rel="nofollow noreferrer">mojibake</a>. If you see something like <code>HÃ©llÃ¶</code> instead of <code>Héllö</code>, this is your issue.
In short, then, you need to know:
<ul>
<li>What is the character set of the page you scraped, or the data you received? Was it correctly scraped? Did the originator correctly identify its encoding, or are you able to otherwise obtain this information (or guess it)? Some web sites incorrectly declare a different character set than the page actually contains, some sites have incorrectly configured the connection between the web server and a back-end database. See e.g. <a href="https://stackoverflow.com/questions/46253288/scrape-with-correct-character-encoding-python-requests-beautifulsoup">scrape with correct character encoding (python requests + beautifulsoup)</a> for a more detailed example with some solutions.
</li>
<li>What is the character set you want to write? If printing to the screen, is your terminal correctly configured, and is your Python interpreter configured identically?
Perhaps see also <a href="https://stackoverflow.com/questions/3578685/how-to-display-utf-8-in-windows-console">How to display utf-8 in windows console</a>
</li>
</ul>
If you are here, probably the answer to one of these questions is not &quot;UTF-8&quot;. This is increasingly becoming the prevalent encoding for web pages, too, though the former standard was ISO-8859-1 (aka Latin-1) and more recently Windows code page 1252.
Going forward, you basically want all your textual data to be Unicode, outside of a few fringe use cases. Generally, that means UTF-8, though on Windows (or if you need Java compatibility), UTF-16 is also vaguely viable, albeit somewhat cumbersome. (There are several other Unicode serialization formats, which may be useful in specialized circumstances. UTF-32 is technically trivial, but takes up a lot more memory; UTF-7 is used in a few network protocols where 7-bit ASCII is required for transport.)
Perhaps see also <a href="https://utf8everywhere.org/" rel="nofollow noreferrer">https://utf8everywhere.org/</a>
Naturally, if you are printing to a file, you also need to examine that file using a tool which can correctly display it. A common pilot error is to open the file using a tool which only displays the currently selected system encoding, or one which tries to guess the encoding, but guesses wrong. Again, a common symptom when viewing UTF-8 text using Windows code page 1252 would result, for example, in <code>Héllö</code> displaying as <code>HÃ©llÃ¶</code>.
If the encoding of character data is unknown, there is no simple way to automatically establish it. If you know what the text is supposed to represent, you can perhaps infer it, but this is typically a manual process with some guesswork involved. (Automatic tools like <a href="https://pypi.org/project/chardet/" rel="nofollow noreferrer"><code>chardet</code></a> and <a href="https://github.com/LuminosoInsight/python-ftfy" rel="nofollow noreferrer"><code>ftfy</code></a> can help, but they get it wrong some of the time, too.)
To establish which encoding you are looking at, it can be helpful if you can identify the individual bytes in a character which isn't displayed correctly. For example, if you are looking at <code>H\x8ell\x9a</code> but expect it to represent <code>Héllö</code>, you can look up the bytes in a translation table. I have published one such table at <a href="https://tripleee.github.io/8bit" rel="nofollow noreferrer">https://tripleee.github.io/8bit</a> where you can see that in this example, it's probably one of the legacy Mac 8-bit character sets; with more data points, perhaps you can narrow it down to just one of them (and if not, any one of them will do in practice, since all the code points you care about map to the same Unicode characters).
Python 3 on most platforms defaults to UTF-8 for all input and output, but on Windows, this is commonly not the case. It will then instead default to the system's default encoding (still misleadingly called &quot;ANSI code page&quot; in some Microsoft documentation), which depends on a number of factors. On Western systems, the default encoding out of the box is commonly Windows code page 1252.
(Earlier Python versions had somewhat different expectations, and in Python 2, the internal string representation was not Unicode.)
If you are on Windows and write UTF-8 to a text file, maybe specify <code>encoding=&quot;utf-8-sig&quot;</code> which adds a BOM sequence at the beginning of the file. This is strictly speaking not necessary or correct, but some Windows tools need it to correctly identify the encoding.
Several of the earlier answers here suggest blindly applying some encoding, but hopefully this should help you understand how that's not generally the correct approach, and how to figure out - rather than guess - which encoding to use.

blocks|key|1128490|text|从Python3.7开始，将环境变量PYTHONUTF8设置为1|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1128491|下面的脚本还包括设置变量的其他有用变量。|1128492|setx+/m+PYTHONUTF8+1
setx+PATHEXT+"%25PATHEXT%25;.PY"+;+In+CMD,+Python+file+can+be+executed+without+extesnion.
setx+/m+PY_PYTHON+3.10+;+To+set+default+python+version+for+py|code-block|syntax|javascript|1128493|来源|1128494|entityMap|0|LINK|mutability|MUTABLE|url|https://dev.to/methane/python-use-utf-8-mode-on-windows-212i^0|I|A|0|0|0|0|2|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|Z|8|@]|D|@]|E|$]]|$1|H|3|I|5|J|7|10|8|@]|D|@]|E|$K|L]]|$1|M|3|N|5|6|7|11|8|@]|D|@$9|12|A|13|1|14]]|E|$]]|$1|O|3|-4|5|6|7|15|8|@]|D|@]|E|$]]]|P|$Q|$5|R|S|T|E|$U|V]]]]

From Python 3.7 onwards,
Set the the environment variable <code>PYTHONUTF8</code> to 1
The following script included other useful variables too which set System Environment Variables.
<pre><code>setx /m PYTHONUTF8 1
setx PATHEXT &quot;%PATHEXT%;.PY&quot; ; In CMD, Python file can be executed without extesnion.
setx /m PY_PYTHON 3.10 ; To set default python version for py
</code></pre>
<a href="https://dev.to/methane/python-use-utf-8-mode-on-windows-212i" rel="nofollow noreferrer">Source</a>

blocks|key|1344837|text|我得到了同样的错误，所以我使用(encoding="utf-8")，它解决了错误。这通常发生在我们的编码器不理解的文本数据中有一些不确定的符号或图案。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1344838|with+open("text.txt",+"w",+encoding='utf-8')+as+f:
+++++f.write(data)|code-block|syntax|javascript|1344839|这会解决你的问题。|1344840|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

I got the same error so I use (encoding=&quot;utf-8&quot;) and it solve the error.
This generally happens when we got some unidentified symbol or pattern in text data that our encoder does not understand.
<pre><code>with open(&quot;text.txt&quot;, &quot;w&quot;, encoding='utf-8') as f:
 f.write(data)
</code></pre>
This will solve your problem.

blocks|key|1132378|text|如果正在使用windows，请尝试传递编码=‘latin1+1’、编码=‘iso-8859-1’或编码=‘to+1252’示例：|type|unstyled|depth|inlineStyleRanges|offset|length|style|BOLD|entityRanges|data|1132379|csv_data+=+pd.read_csv(csvpath,encoding='iso-8859-1')
print(print(soup.encode('iso-8859-1')))|code-block|syntax|javascript|1132380|entityMap^0|J|16|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

if you are using windows try to pass encoding='latin1', encoding='iso-8859-1' or encoding='cp1252'
example:
<pre><code>csv_data = pd.read_csv(csvpath,encoding='iso-8859-1')
print(print(soup.encode('iso-8859-1')))
</code></pre>

I'm trying to scrape a website, but it gives me an error.

I'm using the following code:

<pre><code>import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen("https://www.website.com/")
html = get.read()

soup = BeautifulSoup(html)

print(soup)
</code></pre>

And I'm getting the following error:

<pre><code>File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to &lt;undefined&gt;
</code></pre>

What can I do to fix this?

UnicodeEncodeError: 'charmap' codec can't encode characters

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

EdgeOne AI 安全实战专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我试图刮一个网站，但它给了我一个错误。我使用以下代码：import urllib.requestfrom bs4 import BeautifulSoupget = urllib.request.urlopen("https://www.website.com/")html = get.read()soup = BeautifulSoup(html)print(soup)我得到了以下错误：File

问UnicodeEncodeError：“charmap”编解码器不能编码字符
EN

回答 11

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问UnicodeEncodeError：“charmap”编解码器不能编码字符EN

回答 11

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问UnicodeEncodeError：“charmap”编解码器不能编码字符
EN