首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >处理/移除UTF-8的右向左重写字符的最佳方法是什么?

处理/移除UTF-8的右向左重写字符的最佳方法是什么?
EN

Stack Overflow用户
提问于 2014-01-24 01:24:14
回答 1查看 1.8K关注 0票数 4

有一个utf-8字符(HEX字节E2 80 AE),当被启用了utf-8的系统正确处理时,当显示给用户时,就会显示出反条件的字符。通常被蛇用来隐藏或搅乱文件扩展名。

以下是此类文件名字符串的示例:

代码语言:javascript
复制
an .EXE called: EvilFile‮.EXE

an .scr called: yo.na‮.scr

如果完成了文件名扩展验证,就不会有问题,会导致问题的是这样的字符串的显示,htmlentities()会导致字符串变成: EvilFileâ�®.EXE。

那么,将文件名修复回EvilFile.EXE的最佳解决方案是什么?

我用iconv所做的测试在输出上产生了同样的编码问题。

代码语言:javascript
复制
<!DOCTYPE html>
<head>
    <meta charset="utf-8"> 
    <title></title>
</head>

<body>
<?php
$evilString = "EvilFile‮.EXE";
$ret = null;

$ret .= '<h1>htmlentities/ENT_QUOTES | ENT_IGNORE</h1>';
$ret .= htmlentities($evilString, ENT_QUOTES | ENT_IGNORE, "UTF-8").'<br>';

//enc options
$enc = array(
    "UTF-8", 
    "ASCII", 
    "Windows-1252", 
    "ISO-8859-15", 
    "ISO-8859-1", 
    "ISO-8859-6", 
    "CP1256",
    "US-ASCII//TRANSLIT", 
    "UTF-8//IGNORE",
    "UTF-8//TRANSLIT"
 );

//iconv
foreach ($enc as $i) {
    $ret .= '<h1>iconv/'.$i.'</h1>';
    foreach ($enc as $j) {
        $ret .= " $i - $j: ".@iconv($i, $j, $evilString).'<br>';
    }
}

//mb_convert_encoding
$ret .= '<h1>mb_convert_encoding</h1>';
foreach (mb_list_encodings() as $chr) {
    $ret .= $chr.' - '.mb_convert_encoding($evilString, 'UTF-8', $chr)."<br>";   
} 

echo $ret;
?> 
</body>
</html>

结果

代码语言:javascript
复制
iconv/US-ASCII//TRANSLIT
------------------------
US-ASCII//TRANSLIT - UTF-8: EvilFile
US-ASCII//TRANSLIT - ASCII: EvilFile
US-ASCII//TRANSLIT - Windows-1252: EvilFile
US-ASCII//TRANSLIT - ISO-8859-15: EvilFile
US-ASCII//TRANSLIT - ISO-8859-1: EvilFile
US-ASCII//TRANSLIT - ISO-8859-6: EvilFile
US-ASCII//TRANSLIT - CP1256: EvilFile
US-ASCII//TRANSLIT - US-ASCII//TRANSLIT: EvilFile
US-ASCII//TRANSLIT - UTF-8//IGNORE: EvilFile.EXE <<< - See answer below
US-ASCII//TRANSLIT - UTF-8//TRANSLIT: EvilFile

iconv/UTF-8//IGNORE
-------------------
UTF-8//IGNORE - UTF-8: EvilFile‮.EXE
UTF-8//IGNORE - ASCII: EvilFile
UTF-8//IGNORE - Windows-1252: EvilFile
UTF-8//IGNORE - ISO-8859-15: EvilFile
UTF-8//IGNORE - ISO-8859-1: EvilFile
UTF-8//IGNORE - ISO-8859-6: EvilFile
UTF-8//IGNORE - CP1256: EvilFile
UTF-8//IGNORE - US-ASCII//TRANSLIT: EvilFile
UTF-8//IGNORE - UTF-8//IGNORE: EvilFile‮.EXE
UTF-8//IGNORE - UTF-8//TRANSLIT: EvilFile‮.EXE

iconv/UTF-8//TRANSLIT
---------------------
UTF-8//TRANSLIT - UTF-8: EvilFile‮.EXE
UTF-8//TRANSLIT - ASCII: EvilFile
UTF-8//TRANSLIT - Windows-1252: EvilFile
UTF-8//TRANSLIT - ISO-8859-15: EvilFile
UTF-8//TRANSLIT - ISO-8859-1: EvilFile
UTF-8//TRANSLIT - ISO-8859-6: EvilFile
UTF-8//TRANSLIT - CP1256: EvilFile
UTF-8//TRANSLIT - US-ASCII//TRANSLIT: EvilFile
UTF-8//TRANSLIT - UTF-8//IGNORE: EvilFile‮.EXE
UTF-8//TRANSLIT - UTF-8//TRANSLIT: EvilFile‮.EXE

mb_convert_encoding
-------------------
pass - EvilFileâ®.EXE
auto - EvilFile‮.EXE
wchar - EvilFileâ®.EXE
byte2be - 䕶楬䙩汥긮䕘
byte2le - 癅汩楆敬胢⺮塅
byte4be - ������������?
byte4le - ������������������
BASE64 - ��)^q
UUENCODE -
HTML-ENTITIES - EvilFileâ®.EXE
Quoted-Printable - EvilFile‮.EXE
7bit - EvilFileâ®.EXE
8bit - EvilFileâ®.EXE
UCS-4 - ������������?
UCS-4BE - ������������?
UCS-4LE - ������������������
UCS-2 - 䕶楬䙩汥긮䕘
UCS-2BE - 䕶楬䙩汥긮䕘
UCS-2LE - 癅汩楆敬胢⺮塅
UTF-32 - ?
UTF-32BE - ?
UTF-32LE -
UTF-16 - 䕶楬䙩汥긮䕘
UTF-16BE - 䕶楬䙩汥긮䕘
UTF-16LE - 癅汩楆敬胢⺮塅
UTF-8 - EvilFile‮.EXE
UTF-7 - EvilFile???.EXE
UTF7-IMAP - EvilFile???.EXE
ASCII - EvilFileâ®.EXE
EUC-JP - EvilFile??EXE
SJIS - EvilFile窶ョ.EXE
eucJP-win - EvilFile??EXE
SJIS-win - EvilFile窶ョ.EXE
CP932 - EvilFile窶ョ.EXE
CP51932 - EvilFile??EXE
JIS - EvilFile??ョ.EXE
ISO-2022-JP - EvilFile??ョ.EXE
ISO-2022-JP-MS - EvilFile??ョ.EXE
Windows-1252 - EvilFile‮.EXE
Windows-1254 - EvilFile‮.EXE
ISO-8859-1 - EvilFileâ®.EXE
ISO-8859-2 - EvilFileâŽ.EXE
ISO-8859-3 - EvilFileâ?.EXE
ISO-8859-4 - EvilFileâŽ.EXE
ISO-8859-5 - EvilFileтЎ.EXE
ISO-8859-6 - EvilFileق?.EXE
ISO-8859-7 - EvilFileβ?.EXE
ISO-8859-8 - EvilFileג®.EXE
ISO-8859-9 - EvilFileâ®.EXE
ISO-8859-10 - EvilFileâŪ.EXE
ISO-8859-13 - EvilFileā®.EXE
ISO-8859-14 - EvilFileâ®.EXE
ISO-8859-15 - EvilFileâ®.EXE
ISO-8859-16 - EvilFileâ®.EXE
EUC-CN - EvilFile??EXE
CP936 - EvilFile鈥?EXE
HZ - EvilFile???.EXE
EUC-TW - EvilFile??EXE
BIG-5 - EvilFile??EXE
EUC-KR - EvilFile??EXE
UHC - EvilFile巽?EXE
ISO-2022-KR - EvilFile???.EXE
Windows-1251 - EvilFile‮.EXE
CP866 - EvilFileтАо.EXE
KOI8-R - EvilFileБ─╝.EXE
KOI8-U - EvilFileБ─╝.EXE
ArmSCII-8 - EvilFileՉ….EXE
CP850 - EvilFileÔÇ«.EXE
JIS-ms - EvilFile??ョ.EXE
CP50220 - EvilFile??ョ.EXE
CP50220raw - EvilFile??ョ.EXE
CP50221 - EvilFile??ョ.EXE
CP50222 - EvilFile??ョ.EXE

我想有(我不喜欢的)。通过编码()传递字符串,然后通过替换()删除喜怒无常的字符。但必须有更好/更清洁的办法。

代码语言:javascript
复制
echo preg_replace('/[^a-z0-9_ \[\]\.\(\)#%&-]/si', '', utf8_encode($evilString)).'<br>';
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-01-24 17:47:51

在进一步的测试中,我添加了US-ASCII//TRANSLIT - UTF-8//IGNORE,以便在不使用regex的情况下修复这些类型的字符串,您可以使用:

代码语言:javascript
复制
echo iconv('US-ASCII//TRANSLIT', 'UTF-8//IGNORE', $evilString); //EvilFile.EXE

希望这能帮助任何人在未来解决这个独特的问题。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/21322702

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档