我在一个数据库中有一组数据,它是用unicode字符输入的,但它们被解释为字符串。也就是说,这里应该有一个撇号’,我实际得到的是\u2019
因此,我现在需要将其转换为其字符表示形式,即’。首先,很容易将字符串更改为其实体版本:’,然后需要将其转换为正确的UTF8多字节字符串。
我尝试过多种方法;在我的本地服务器上,我可以使用preg_match函数提取字符,然后将每个字符传递给以下函数:
mb_convert_encoding($string, "UTF-8", "HTML-ENTITIES");听起来很明智,而且工作起来没有任何问题。在浏览器中关闭UTF-8字符集表明,当使用浏览器默认编码读取时,该字符集实际上已转换为’。
但是,完全相同的代码在我的生产环境中运行时,在呈现为UTF-8时会产生可怕的“缺失符号”框。关闭UTF-8,它会生成任何呈现为ò°‘£的字节流。它似乎输出4个字节而不是3个字节,我不知道这是否相关,因为我对字符编码没有很好的理解。
我假设问题出在我的mbstring设置上。以下是我的本地服务器上的mbstring设置:
Multibyte Support enabled
Multibyte string engine libmbfl
HTTP input encoding translation disabled
Multibyte (japanese) regex support enabled
Multibyte regex (oniguruma) version 4.7.1
mbstring.detect_order no value no value
mbstring.encoding_translation Off Off
mbstring.func_overload 0 0
mbstring.http_input auto auto
mbstring.http_output UTF-8 UTF-8
mbstring.http_output_conv_mimetypes ^(text/|application/xhtml\+xml)^(text/|application/xhtml\+xml)
mbstring.internal_encoding UTF-8 UTF-8
mbstring.language neutral neutral
mbstring.strict_detection Off Off
mbstring.substitute_character no value no value我的生产环境有一些不同之处:
Multibyte Support enabled
Multibyte string engine libmbfl
Multibyte (japanese) regex support enabled
Multibyte regex (oniguruma) version 3.7.1
mbstring.detect_order no value no value
mbstring.encoding_translation Off Off
mbstring.func_overload 0 0
mbstring.http_input auto auto
mbstring.http_output UTF-8 UTF-8
mbstring.internal_encoding UTF-8 UTF-8
mbstring.language neutral neutral
mbstring.strict_detection Off Off
mbstring.substitute_character no value no value有人看到我做错了什么吗?
发布于 2011-12-28 23:10:36
看看这是否对你有帮助:hex2ascii and ascii2hex
2012-09-19新增:
function ascii2hex($ascii)
{
$hex = '';
for ($i = 0; $i < strlen($ascii); $i++)
{
$byte = strtoupper(dechex(ord($ascii{$i})));
$byte = str_repeat('0', 2 - strlen($byte)).$byte;
$hex .= $byte." ";
}
return $hex;
}
function hex2ascii($hex)
{
$ascii = '';
$hex = str_replace(" ", "", $hex);
for($i = 0; $i < strlen($hex); $i = $i+2)
$ascii .= chr(hexdec(substr($hex, $i, 2)));
return($ascii);
}发布于 2017-02-24 21:13:26
我猜你要找的是ord和chr的多字节版本。
为此,我编写了以下polyfill:
if (!function_exists('mb_internal_encoding')) {
function mb_internal_encoding($encoding = NULL) {
return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
}
}
if (!function_exists('mb_convert_encoding')) {
function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
}
}
if (!function_exists('mb_chr')) {
function mb_chr($ord, $encoding = 'UTF-8') {
if ($encoding === 'UCS-4BE') {
return pack("N", $ord);
} else {
return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
}
}
}
if (!function_exists('mb_ord')) {
function mb_ord($char, $encoding = 'UTF-8') {
if ($encoding === 'UCS-4BE') {
list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);
return $ord;
} else {
return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
}
}
}演示
echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));
echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));
echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('我'));
var_dump(mb_ord('好'));
echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('我')));
var_dump(dechex(mb_ord('好')));输出:
Get string from numeric DEC value
string(3) "我"
string(3) "好"
Get string from numeric HEX value
string(3) "我"
string(3) "好"
Get numeric value of character as DEC string
int(25105)
int(22909)
Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"https://stackoverflow.com/questions/8155401
复制相似问题