非常初级的问题。在尝试运行用LP3THW Ex.23编写的脚本时,PowerShell不显示外文字符。我假设它与UTF16 / UTF8编码有关,但我不能从其他关于堆栈溢出的帖子中找出它。
下面是脚本:
import sys
script, input_encoding, error = sys.argv
def main(language_file, encoding, errors):
line = language_file.readline()
if line:
print_line(line, encoding, errors)
return main(language_file, encoding, errors)
def print_line(line, encoding, errors):
next_lang = line.strip()
raw_bytes = next_lang.encode(encoding, errors=errors)
cooked_string = raw_bytes.decode(encoding, errors=errors)
print(raw_bytes, "<===>", cooked_string)
languages = open("languages.txt", encoding="utf-8")
main(languages, input_encoding, error)可在此处查看文本文件内容(Languages.txt):https://learnpythonthehardway.org/python3/languages.txt
运行脚本时的PowerShell终端镜像:

更让我困惑的其他帖子的链接:
发布于 2021-10-25 15:53:03
有几个问题:
- The **default font in regular console windows is limited in terms of the Unicode characters it can display**, and many of those present in your sample file are _not_ supported.- While you can try to switch to a different font that (hopefully) can render all the characters you need - as described in [one of the answers you link to](https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8) - **consider switching to** [**Windows Terminal**](https://github.com/microsoft/terminal), installable from the Microsoft store: it provides support for a much wider range of characters by default.不带BOM的对UTF-8文本文件的
- In _**Windows PowerShell**_ - which is what you're using, judging by the screen shot - **BOM-less text files are assumed to be** _**ANSI-encoded**_, i.e. to be encoded with the legacy ANSI code page based on your machine's system locale (language for non-Unicode programs), such as Windows-1252 on US-English systems.- _**PowerShell (Core) 7+**_**, by contrast, now commendably assumes UTF-8**, and generally uses BOM-less UTF-8 as the _consistent_ default (including when _writing_ files).- Therefore, **to** _**decode**_ **the file properly, use** **`Get-Content -Encoding Utf8 languages.txt`** **in Windows PowerShell**. - Note: This in turn may reveal _rendering problems_ due to lack of support for certain Unicode characters in the active font, but in Windows Terminal you'd see the expected output.- If you're only **printing** _**directly to the console**_**, your script's content will appear correctly**, barring any rendering problems due to unsupported characters. The reason is that Python detects this output scenario and use a Unicode-enabled API to print.- **More work is needed if you need to** _**further process**_ **the output**, such as by capturing it in a variable, sending it to another command, or saving it to a file: - **Python defaults to** _**ANSI**_**(!) encoding on output to stdout**, so it must be **instructed to output UTF-8 instead**, which can you do by setting `$env:PYTHONUTF8=1` beforehand or passing `-X utf8` on the `python` / `py` command line (v3.7+). - Complementarily, **PowerShell must (temporarily) be instructed to expect UTF-8 output from external programs** (instead of the output encoded with the legacy OEM code page), which requires executing `[Console]::OutputEncoding = [System.Text.Utf8Encoding]::new()`要以示例PowerShell脚本(.ps1)的形式将所有这些放在一起:
# PREREQUISITES:
# * In a *regular console window*:
# Choose a font that supports all characters in language.txt, if possible
# * Preferably, run from *Windows Terminal*.
# Additionally, the code assumes:
# * Windows 10 or higher.
# * Python 3.7 or higher.
# Download the sample file.
# It contains a list of language names expressed in each language natively,
# therefore containing many non-ASCII-range characters, including CJK ones.
curl.exe -O https://learnpythonthehardway.org/python3/languages.txt
# Print the sample file using a PowerShell command.
# Assuming you've chosen a suitable font or are running from Windows Terminal,
# all non-ASCII-range should characters correctly.
Get-Content -Encoding Utf8 languages.txt
pause
# Invoke your Python script file and let it *print directly to the console*.
# Again, this should render the non-ASCII-range characters correctly.
python script.py utf8 strict
pause
# Invoke it again, but with further processing, which requires
# * requesting that Python use UTF-8
# * making PowerShell expect UTF-8
# (Temporarily) tell PowerShell to expect UTF-8 stdout output
# from external programs.
$prevEncoding = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Utf8Encoding]::new()
# Invoke the Python script, telling Python to output UTF-8 to stdout.
# Select-Object -Firt 10 limits the output to the first 10 lines.
# Note that this operation alone involves decoding of Python's output by PowerShell.
# Again, this should render the non-ASCII-range characters correctly.
python -X utf8 script.py utf8 strict | Select-Object -First 10
[Console]::OutputEncoding = $prevEncodinghttps://stackoverflow.com/questions/69706561
复制相似问题