我正在尝试在python 3中将一个表情转换成它的Unicode。例如,我有一个emoji,然后想要得到相应的unicode 'U+1F600‘。同样,我想将'U+1F600‘转换回。现在我已经阅读了文档,并尝试了几种选择,但是蟒蛇的行为让我在这里感到困惑。
>>> x = ''
>>> y = x.encode('utf-8')
>>> y
b'\xf0\x9f\x98\x80'表情符号被转换为字节对象。
>>> z = y.decode('utf-8')
>>> z
''将byte对象转换回表情符号,到目前为止一切正常。
现在,用unicode来表示表情符号:
>>> c = '\U0001F600'
>>> d = c.encode('utf-8')
>>> d
>>> b'\xf0\x9f\x98\x80'这将再次打印字节编码。
>>> d.decode('utf-8')
>>> ''这会再次打印出表情符号。我真的想不出如何在Unicode和emoji之间进行转换。
发布于 2017-12-08 22:27:49
'‘已经是Unicode对象。UTF-8不是Unicode,它是Unicode的字节编码。要获取Unicode字符的代码点编号,可以使用ord函数。要以您想要的形式打印它,您可以将其格式化为十六进制。如下所示:
s = ''
print('U+{:X}'.format(ord(s)))输出
U+1F600如果你有Python,你可以通过使用f- 3.6+使它更短(更高效):
s = ''
print(f'U+{ord(s):X}')顺便说一句,如果你想创建一个像'\U0001F600'这样的Unicode转义序列,这里有'unicode-escape'编解码器。但是,它返回一个bytes字符串,您可能希望将其转换回文本。您可以使用'UTF-8‘编解码器,但您也可以只使用' ASCII’编解码器,因为它保证只包含有效的ASCII。
s = ''
print(s.encode('unicode-escape'))
print(s.encode('unicode-escape').decode('ASCII'))输出
b'\\U0001f600'
\U0001f600我建议你看看这篇由Stack Overflow联合创始人Joel Spolsky The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)撰写的短文。
发布于 2020-10-28 12:27:47
sentence = "Head-Up Displays (HUD) for #automotive sector\n \nThe #UK-based #startup Envisics got €42 million #funding from l… "
print("normal sentence - ", sentence)
uc_sentence = sentence.encode('unicode-escape')
print("\n\nunicode represented sentence - ", uc_sentence)
decoded_sentence = uc_sentence.decode('unicode-escape')
print("\n\ndecoded sentence - ", decoded_sentence)输出
normal sentence - Head-Up Displays (HUD) for #automotive sector
The #UK-based #startup Envisics got €42 million #funding from l…
unicode represented sentence - b'Head-Up Displays (HUD)\\U0001f4bb for #automotive\\U0001f697 sector\\n \\nThe #UK-based #startup\\U0001f680 Envisics got \\u20ac42 million #funding\\U0001f4b0 from l\\u2026 '
decoded sentence - Head-Up Displays (HUD) for #automotive sector
The #UK-based #startup Envisics got €42 million #funding from l… https://stackoverflow.com/questions/47716217
复制相似问题