首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >艰难学习Python 3 Ex.23 - powershell中未显示的字符

艰难学习Python 3 Ex.23 - powershell中未显示的字符
EN

Stack Overflow用户
提问于 2021-10-25 10:35:52
回答 1查看 51关注 0票数 1

非常初级的问题。在尝试运行用LP3THW Ex.23编写的脚本时,PowerShell不显示外文字符。我假设它与UTF16 / UTF8编码有关,但我不能从其他关于堆栈溢出的帖子中找出它。

下面是脚本:

代码语言:javascript
复制
import sys
script, input_encoding, error = sys.argv


def main(language_file, encoding, errors):
    line = language_file.readline()
    
    if line:
        print_line(line, encoding, errors)
        return main(language_file, encoding, errors)
        
        
def print_line(line, encoding, errors):
    next_lang = line.strip()
    raw_bytes = next_lang.encode(encoding, errors=errors)
    cooked_string = raw_bytes.decode(encoding, errors=errors)
    
    print(raw_bytes, "<===>", cooked_string)
    
    
languages = open("languages.txt", encoding="utf-8")

main(languages, input_encoding, error)

可在此处查看文本文件内容(Languages.txt):https://learnpythonthehardway.org/python3/languages.txt

运行脚本时的PowerShell终端镜像:

更让我困惑的其他帖子的链接:

Changing PowerShell's default output encoding to UTF-8

UTF8 Script in PowerShell outputs incorrect characters

EN

回答 1

Stack Overflow用户

发布于 2021-10-25 15:53:03

有几个问题:

  • Unicode character rendering

代码语言:javascript
复制
- The **default font in regular console windows is limited in terms of the Unicode characters it can display**, and many of those present in your sample file are _not_ supported.
代码语言:javascript
复制
- While you can try to switch to a different font that (hopefully) can render all the characters you need - as described in [one of the answers you link to](https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8) - **consider switching to** [**Windows Terminal**](https://github.com/microsoft/terminal), installable from the Microsoft store: it provides support for a much wider range of characters by default.

不带BOM的对UTF-8文本文件的

  • PowerShell's解释

代码语言:javascript
复制
- In _**Windows PowerShell**_ - which is what you're using, judging by the screen shot - **BOM-less text files are assumed to be** _**ANSI-encoded**_, i.e. to be encoded with the legacy ANSI code page based on your machine's system locale (language for non-Unicode programs), such as Windows-1252 on US-English systems.
代码语言:javascript
复制
- _**PowerShell (Core) 7+**_**, by contrast, now commendably assumes UTF-8**, and generally uses BOM-less UTF-8 as the _consistent_ default (including when _writing_ files).
代码语言:javascript
复制
- Therefore, **to** _**decode**_ **the file properly, use** **`Get-Content -Encoding Utf8 languages.txt`** **in Windows PowerShell**.
代码语言:javascript
复制
    - Note: This in turn may reveal _rendering problems_ due to lack of support for certain Unicode characters in the active font, but in Windows Terminal you'd see the expected output.

  • Python's output character encoding

代码语言:javascript
复制
- If you're only **printing** _**directly to the console**_**, your script's content will appear correctly**, barring any rendering problems due to unsupported characters. The reason is that Python detects this output scenario and use a Unicode-enabled API to print.
代码语言:javascript
复制
- **More work is needed if you need to** _**further process**_ **the output**, such as by capturing it in a variable, sending it to another command, or saving it to a file:
代码语言:javascript
复制
    - **Python defaults to** _**ANSI**_**(!) encoding on output to stdout**, so it must be **instructed to output UTF-8 instead**, which can you do by setting `$env:PYTHONUTF8=1` beforehand or passing `-X utf8` on the `python` / `py` command line (v3.7+).
代码语言:javascript
复制
    - Complementarily, **PowerShell must (temporarily) be instructed to expect UTF-8 output from external programs** (instead of the output encoded with the legacy OEM code page), which requires executing `[Console]::OutputEncoding = [System.Text.Utf8Encoding]::new()`

要以示例PowerShell脚本(.ps1)的形式将所有这些放在一起:

代码语言:javascript
复制
# PREREQUISITES:
#  * In a *regular console window*: 
#    Choose a font that supports all characters in language.txt, if possible
#  * Preferably, run from *Windows Terminal*.
#  Additionally, the code assumes:
#    * Windows 10 or higher.
#    * Python 3.7 or higher.

# Download the sample file.
# It contains a list of language names expressed in each language natively,
# therefore containing many non-ASCII-range characters, including CJK ones.
curl.exe -O https://learnpythonthehardway.org/python3/languages.txt

# Print the sample file using a PowerShell command.
# Assuming you've chosen a suitable font or are running from Windows Terminal, 
# all non-ASCII-range should characters correctly.
Get-Content -Encoding Utf8 languages.txt

pause

# Invoke your Python script file and let it *print directly to the console*.
# Again, this should render the non-ASCII-range characters correctly.
python script.py utf8 strict

pause

# Invoke it again, but with further processing, which requires
#  * requesting that Python use UTF-8
#  * making PowerShell expect UTF-8

# (Temporarily) tell PowerShell to expect UTF-8 stdout output 
# from external programs.
$prevEncoding = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Utf8Encoding]::new()

# Invoke the Python script, telling Python to output UTF-8 to stdout.
# Select-Object -Firt 10 limits the output to the first 10 lines.
# Note that this operation alone involves decoding of Python's output by PowerShell.
# Again, this should render the non-ASCII-range characters correctly.
python -X utf8 script.py utf8 strict | Select-Object -First 10

[Console]::OutputEncoding = $prevEncoding
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69706561

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档