文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在Python中找到用于表情符号的unicode平面

问如何在Python中找到用于表情符号的unicode平面
EN

Stack Overflow用户

提问于 2020-11-23 17:54:05

回答 2查看 404关注 0票数 1

我有包含表情符号的熊猫数据，我想根据它们的Unicode平面对它们分类。

emoji | unicode
---------------
    |  1F602
    |  1F60A

预期产出

emoji | unicode | Plane
-----------------------
    |  1F602  |   1    
    |  1F60A  |   1
 ⛹   |  26F9   |   0

这里，平面0指基本多语种平面(BMP)，平面1指补充多语种平面(SMP)。

注意:请在Mac上使用Safari，在Linux上使用Firefox，在Windows上使用Chrome查看这个问题，并使用适当的表情符号

emoji

python-3.x

pandas

dataframe

unicode

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-11-23 19:28:11

两者都属于，补充多语种平面(SMP)。

下面的代码片段可以举例说明获取Unicode平面#的算法(它是ord(ch)>>16，参见位右移)。

for ch in '✌⛹☹☺☻':
    print( ch, '\t{:04x}\t'.format(ord(ch)), ord(ch)>>16)

✌270 c 0⛹26f9 0☹2639 0☺263 a 0☻263 B 0 1f 602 1 1f60a 1

票数 1

Stack Overflow用户

发布于 2020-11-23 19:15:12

请总是给一个最小可重现性示例帮助别人帮助你。

根据你在Unicode平面上的链接，

有17个平面，由数字0到16标识，对应于前两个位置的可能值00-10(以16为基数)，以六位十六进制格式(U+hhhhhh)表示。

基于这一解释，让我们编写一个函数来获取这些信息。

# in the comments, we can use char = ''
def unicode_to_plane(char: str) -> int:
    unicode_codepoint = ord(char)       # 128512
    hex_repr = hex(unicode_codepoint)   # '0x1f600'
    hex_digits = hex_repr[2:]           # '1f600'
    plane = 0                           # Assume plane is 0 until proven otherwise
    if len(hex_digits) > 4:             # The plane is 0 if hex representation is four hex digits or less
        hex_plane = hex_digits[:-4]     # '1' (take away the last four characters)
        plane = int(hex_plane, 16)      # 1 (convert hex characters to integer)
    return plane                        # 1

请注意，根据Emoji上的wiki，

大多数(但不是全部)表情符号都包含在Unicode的补充多语言平面(SMP)中。

SMP是平面1。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64973683

复制

相似问题

问如何在Python中找到用于表情符号的unicode平面
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Python中找到用于表情符号的unicode平面EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Python中找到用于表情符号的unicode平面
EN