我有这段在PS7.1中工作的代码,用来计算包含emojis的字符串的总长度。
在5.1中,它没有返回正确的值。因此,为了防止你浪费时间,最好使用7.1。
我的问题是:
如何使用PowerShell 7.1获取表情符号计数?
Add-Type -AssemblyName System.Globalization
$str = '️' # 7.1 Returns 3 but 5.1 returns 13.
#$str = '' # 7.1 Returns 7 but 5.1 returns 7.
#$str = "'What is this?' " # 7.1 Returns 22 but 5.1 returns 30.
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($str)
$cnt = 0
while($se.MoveNext()) { $cnt += 1 }
$cnt我所需的输出应该是(仅限表情符号计数):
'️' # Should return 3.
'' # Should return 7.
"'What is this?' " # Should return 4.在编写本报告时,共有3 512个表情符号。
☺️️☹️☠️❣️❤️❤️❤️️️️️️️✋✋✋✋✋✋✌️✌✌✌✌✌☝️☝☝☝☝☝✊✊✊✊✊✊✍️✍✍✍✍✍️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚕️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️⚖️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️✈️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️️️♂️♂️♂️♂️♂️♂️️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♀️♂️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️️♂️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️⛷️️️♂️♂️♂️♂️♂️♂️️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️⛹️⛹⛹⛹⛹⛹⛹️♂️⛹♂️⛹♂️⛹♂️⛹♂️⛹♂️⛹️♀️⛹♀️⛹♀️⛹♀️⛹♀️⛹♀️️️♂️♂️♂️♂️♂️♂️️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️♂️♂️♂️♂️♂️♂️♀️♀️♀️♀️♀️♀️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️️⬛️❄️️️️️☘️️☕️️️⛰️️️️️️️️️️️⛪⛩️⛲⛺️♨️️️️️️⛽⚓⛵️⛴️️✈️️️️⌛⏳⌚⏰⏱️⏲️️️☀️⭐☁️⛅⛈️️️️️️️️️️☂️☔⛱️⚡❄️☃️⛄☄️✨️️️⚽⚾⛳⛸️️♠️♥️♦️♣️♟️️️️⛑️️️️☎️️️⌨️️️️️️️️✉️️✏️✒️️️️️️️️️✂️️️️️⛏️⚒️️️⚔️️⚙️️⚖️⛓️⚗️️️⚰️⚱️♿⚠️⛔☢️☣️⬆️↗️➡️↘️⬇️↙️⬅️↖️↕️↔️↩️↪️⤴️⤵️⚛️️✡️☸️☯️✝️☦️☪️☮️♈♉♊♋♌♍♎♏♐♑♒♓⛎▶️⏩⏭️⏯️◀️⏪⏮️⏫⏬⏸️⏹️⏺️⏏️♀️♂️⚧️✖️➕➖➗♾️‼️⁉️❓❔❕❗〰️⚕️♻️⚜️⭕✅☑️✔️❌❎➰➿〽️✳️✴️❇️©️®️™️#️⃣*️⃣0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣️️ℹ️Ⓜ️️️️️㊗️㊙️⚫⚪⬛⬜◼️◻️◾◽▪️▫️️️️⚧️☠️一些有用的链接:
发布于 2021-04-23 21:25:20
我认为问题在于,您发布的字符串包含了比.Net框架(相对于dotnet / dotnet 5)更不理解的Unicode特性。
例如,表情符号“家庭:女人,女人,女孩,男孩”称为零宽度连接器,虽然它呈现为单个图像,但它由4个单独的表情符号组成,它们与零宽度连接器组合在一起
$zwj = [System.Char]::ConvertFromUtf32( 0x0200D ); # zero width joiner
$girl = [System.Char]::ConvertFromUtf32( 0x1F466 );
$boy = [System.Char]::ConvertFromUtf32( 0x1F467 );
$man = [System.Char]::ConvertFromUtf32( 0x1F468 );
$woman = [System.Char]::ConvertFromUtf32( 0x1F469 );
# https://emojipedia.org/family-woman-woman-girl-boy/
$family_woman_woman_girl_boy = $woman + $zwj + $woman + $zwj + $girl + $zwj + $boy;
$family_woman_woman_girl_boy
# 如果您在PowerShell 7.1 (基于dotnet核心)中运行您的示例,您将得到以下结果:
#
$text = $family_woman_woman_girl_boy;
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($text)
$cnt = 0
while($se.MoveNext()) { $cnt += 1 }
$cnt
# 1但是,如果您在WindowsWindows5.1(基于PowerShell框架)中运行它,您将得到:
#
$text = $family_woman_woman_girl_boy;
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($text)
$cnt = 0
while($se.MoveNext()) { $cnt += 1 }
$cnt
# 7您可以看到PS5.1是在计算单个表情符号,而$zwj =4+3= 7。
在您的第一个示例$str = '️'中,您还获得了以下表情:
$rainbow = [System.Char]::ConvertFromUtf32( 0x1F308 );
$waving_white_flag = [System.Char]::ConvertFromUtf32( 0x1F3F3 );
$light_skin_tone = [System.Char]::ConvertFromUtf32( 0x1F3FB );
$red_hair = [System.Char]::ConvertFromUtf32( 0x1F9B0 );
$var_16 = [System.Char]::ConvertFromUtf32( 0x0FE0F ); # variation selector-16
# https://emojipedia.org/woman-light-skin-tone-red-hair/
$woman_light_skin_tone_red_hair = $woman + $light_skin_tone + $zwj + $red_hair;
# https://emojipedia.org/rainbow-flag/
$rainbow_flag = $waving_white_flag + $var_16 + $zwj + $rainbow;然后您在PowerShell 7.1中的示例是:
# ️
$text = $woman_light_skin_tone_red_hair + $family_woman_woman_girl_boy + $rainbow_flag;
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($text)
$cnt = 0
while($se.MoveNext()) { $cnt += 1 }
$cnt
# 3在PowerShell 5.1中:
# ️
$text = $woman_light_skin_tone_red_hair + $family_woman_woman_girl_boy + $rainbow_flag;
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($text)
$cnt = 0
while($se.MoveNext()) { $cnt += 1 }
$cnt
# 14请注意,PowerShell 5.1没有计算$var_16,但它正在计算$zwj,因此总数是14,而不是15。
如果你用同样的方式分解你的其他例子,结果就更有意义了。
此外,如果将复合表情符号分解为单个项目,则PowerShell 5.1版本可以正常工作:
#
$text = $woman + $woman + $girl + $boy;
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($text)
$cnt = 0
while($se.MoveNext()) { $cnt += 1 }
$cnt
# 4所以这就是复合表情的一个具体问题。
这似乎解释了你看到这些结果的原因,但至于如何解决这个问题.
发布于 2021-04-23 22:15:23
基本上,表情符号是一种巨大的痛苦。让我们用这个字符串,有多少个表情符号字符?
'foo bar'没错,有七个有效的表情符号。表情符号可以与其他表情符号组合或修改:
= + +
= + 在UTF16中(因为powershell的[char]是16位),它看起来如下所示:
# emoji character is split in two for UTF16, and "glued" together with zero-width-joiner character U+200D to other emojis
= '0xd83d,0xdc68,0x200d,0xd83d,0xdc69,0x200d,0xd83d,0xdc66'
# modified with a variation selector character
= '0xd83d,0xdc67,0xd83c,0xdffb'在5.1和v7中,这个函数可以很好地从字符串中获取用于教育目的的表情符号。不过,我只是一次性写了这篇文章--它在很多情况下都行不通(见下文):
Function Get-Emojis {
param ($string = (Get-Content emojis.txt -Encoding UTF8))
# Iterate through each char until emoji
$result = for ($i = 0; $i -lt $string.length) {
# character arrays are made of UTF16 characters, so emojis are always split up into pairs
# check if character is part of a surrogate pair, otherwise increment
if ( [System.Char]::IsSurrogatePair($string[$i],$string[$i+1]) ) {
# add char pair to output
[char[]]$output = $string[$i],$string[$i+1]
$iIsEmoji = $true
$i+=2
# check following characters for either a zero-width-joiner or a variation selector, and add to output.
# loop because many emojis can be combined
While ($iIsEmoji) {
# check if next character is a joiner (unicode 200D)
# or a variation selector character (FE00-FE0F):
if ( $string[$i] -in (0xFE00..0xFE0F+0x200d) ) {
$output += $string[$i]
$i+=1
}
#check if next two characters are emoji modifiers (skin tone etc) (U+1F300-U+1F5FF)
elseif ( [System.Char]::IsSurrogatePair($string[$i],$string[$i+1]) ) {
# check if pair is within modifier character range
if ([System.char]::ConvertToUtf32($string[$i],$string[$i+1]) -in 0x1F300..0x1F5FF ) {
$output += $string[$i],$string[$i+1]
$i+=2
}
# but if not in range, it's probably a separate emoji, so end.
else { $iIsEmoji = $false }
}
else { $iIsEmoji = $false }
}
# Return full emoji string
$output -join $null
}
else {$i += 1}
}
$result
}# Usage: Only one string at a time.
# note: specify encoding when importing weird text from a file - otherwise different powershell versions can misbehave
# example: What emoji is this
$string = get-content C:\temp\emojis.txt -Encoding utf8
$emojis = Get-Emojis -string $string
$emojis.count
4警告:
虽然它显示了这个过程,所以它可能是有帮助的。
发布于 2021-04-24 14:02:41
嗯,ps 7中的枚举数表示'14‘表示第一个字母,'12’表示表情符号。$str2有最简单的行为,其中每个表情符号是两个代孕字符。
$str = '️'
$str.length # 22
$str.EnumerateRunes() | measure | % count # 14
$str2 = ''
$str2.length # 14
$str2.EnumerateRunes() | measure | % count # 7
$str3 = ''
$str3.length # 20
$str3.EnumerateRunes() | measure | % count # 12https://stackoverflow.com/questions/67228804
复制相似问题