在PHP中,获取以utf8编码的所有Unicode whitespace characters的完整列表(字符串数组)最好的方法是什么?
我需要它来生成测试数据。
发布于 2010-02-09 17:34:55
This email (archived here)包含以UTF8、UTF16和HTML编码的所有Unicode空格字符的列表。
在存档链接中查找'utf8_whitespace_table‘函数。
static $whitespace = array(
"SPACE" => "\x20",
"NO-BREAK SPACE" => "\xc2\xa0",
"OGHAM SPACE MARK" => "\xe1\x9a\x80",
"EN QUAD" => "\xe2\x80\x80",
"EM QUAD" => "\xe2\x80\x81",
"EN SPACE" => "\xe2\x80\x82",
"EM SPACE" => "\xe2\x80\x83",
"THREE-PER-EM SPACE" => "\xe2\x80\x84",
"FOUR-PER-EM SPACE" => "\xe2\x80\x85",
"SIX-PER-EM SPACE" => "\xe2\x80\x86",
"FIGURE SPACE" => "\xe2\x80\x87",
"PUNCTUATION SPACE" => "\xe2\x80\x88",
"THIN SPACE" => "\xe2\x80\x89",
"HAIR SPACE" => "\xe2\x80\x8a",
"ZERO WIDTH SPACE" => "\xe2\x80\x8b",
"NARROW NO-BREAK SPACE" => "\xe2\x80\xaf",
"MEDIUM MATHEMATICAL SPACE" => "\xe2\x81\x9f",
"IDEOGRAPHIC SPACE" => "\xe3\x80\x80",
);发布于 2017-10-09 08:24:43
几年后,这个问题在搜索unicode空格字符时在Google上仍然有最高的结果。devio的答案很好,但并不完整。在撰写本文时(2017年10月),维基百科有一个空白字符列表:https://en.wikipedia.org/wiki/Whitespace_character
此列表指定了25个代码点,而当前接受的答案列表为18个。包括其他7个代码点,列表为:
U+0009 character tabulation
U+000A line feed
U+000B line tabulation
U+000C form feed
U+000D carriage return
U+0020 space
U+0085 next line
U+00A0 no-break space
U+1680 ogham space mark
U+180E mongolian vowel separator
U+2000 en quad
U+2001 em quad
U+2002 en space
U+2003 em space
U+2004 three-per-em space
U+2005 four-per-em space
U+2006 six-per-em space
U+2007 figure space
U+2008 punctuation space
U+2009 thin space
U+200A hair space
U+200B zero width space
U+200C zero width non-joiner
U+200D zero width joiner
U+2028 line separator
U+2029 paragraph separator
U+202F narrow no-break space
U+205F medium mathematical space
U+2060 word joiner
U+3000 ideographic space
U+FEFF zero width non-breaking space发布于 2013-12-18 16:32:11
http://en.wikipedia.org/wiki/Space_%28punctuation%29#Spaces_in_Unicode
不幸的是,它没有提供UTF-8,但它确实有网页中的字符,所以你可以剪切并粘贴到你的编辑器中(如果它保存为UTF-8)。或者,http://www.fileformat.info/info/unicode/char/180E/index.htm提供UTF-8 (将"180E“替换为您正在查找的十六进制UTF-16值)。
这也给@devio的优秀答案遗漏了几个额外的字符。
https://stackoverflow.com/questions/2227921
复制相似问题