除了特殊字符外,从CJK字符集(中文、日文和韩文)接受所有字符和数字(0-9)的正则表达式是什么?
发布于 2018-02-07 21:13:08
这是从UCD接口收集到的信息
这是最新的Unicode 10信息。
输出为88,964个字符。
来自界面:
使用对CJK的属性搜索,它们被添加到自定义Rx页面以及
必须是字母或数字并有指定插槽的筛选器。
裁判官
# CJK et all
[\p{Block=CJK_Compatibility}\p{Block=CJK_Compatibility_Forms}\p{Block=CJK_Compatibility_Ideographs}\p{Block=CJK_Compatibility_Ideographs_Supplement}\p{Block=CJK_Radicals_Supplement}\p{Block=CJK_Strokes}\p{Block=CJK_Symbols_And_Punctuation}\p{Block=CJK_Unified_Ideographs}\p{Block=CJK_Unified_Ideographs_Extension_A}\p{Block=CJK_Unified_Ideographs_Extension_B}\p{Block=CJK_Unified_Ideographs_Extension_C}\p{Block=CJK_Unified_Ideographs_Extension_D}\p{Block=CJK_Unified_Ideographs_Extension_E}\p{Block=CJK_Unified_Ideographs_Extension_F}\p{Block=Enclosed_CJK_Letters_And_Months}]
# Must be letters or numbers
(?<= [\p{L}\p{N}] )
# Leave out the unassigned slots
(?<! \p{General_Category=Unassigned} )转换为UTF-8/32的产出
(?:
[\x{3005}-\x{3007}\x{3021}-\x{3029}\x{3031}-\x{3035}\x{3038}-\x{303C}\x{3220}-\x{3229}\x{3248}-\x{324F}\x{3251}-\x{325F}\x{3280}-\x{3289}\x{32B1}-\x{32BF}\x{3400}-\x{4DB5}\x{4E00}-\x{9FEA}\x{F900}-\x{FA6D}\x{FA70}-\x{FAD9}\x{20000}-\x{2A6D6}\x{2A700}-\x{2B734}\x{2B740}-\x{2B81D}\x{2B820}-\x{2CEA1}\x{2CEB0}-\x{2EBE0}\x{2F800}-\x{2FA1D}]
)转换为UTF-16的产出
(?:
[\x{3005}-\x{3007}\x{3021}-\x{3029}\x{3031}-\x{3035}\x{3038}-\x{303C}\x{3220}-\x{3229}\x{3248}-\x{324F}\x{3251}-\x{325F}\x{3280}-\x{3289}\x{32B1}-\x{32BF}\x{3400}-\x{4DB5}\x{4E00}-\x{9FEA}\x{F900}-\x{FA6D}\x{FA70}-\x{FAD9}]
|
(?:
[\x{D840}-\x{D868}] [\x{DC00}-\x{DFFF}]
| \x{D869} [\x{DC00}-\x{DED6}\x{DF00}-\x{DFFF}]
| [\x{D86A}-\x{D86C}] [\x{DC00}-\x{DFFF}]
| \x{D86D} [\x{DC00}-\x{DF34}\x{DF40}-\x{DFFF}]
| \x{D86E} [\x{DC00}-\x{DC1D}\x{DC20}-\x{DFFF}]
| [\x{D86F}-\x{D872}] [\x{DC00}-\x{DFFF}]
| \x{D873} [\x{DC00}-\x{DEA1}\x{DEB0}-\x{DFFF}]
| [\x{D874}-\x{D879}] [\x{DC00}-\x{DFFF}]
| \x{D87A} [\x{DC00}-\x{DFE0}]
| \x{D87E} [\x{DC00}-\x{DE1D}]
)
)https://stackoverflow.com/questions/48672568
复制相似问题