我在IE8中使用XDomainRequest来获取网页的内容。reponseText包含阻止将标记插入到div中的转义字符和unicode字符。这是一个返回数据的示例。
<!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD XHTML 1.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-transitional.dtd\">\u000d\u000a<html xmlns=\"http:\/\/www.w3.org\/1999\/xhtml\">\u000d\u000a<head>\u000d\u000a <title>...<\/title>\u000d\u000a <script src=\"\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/1.7.1\/jquery.min.js\" type=\"text\/javascript\"><\/script>\u000d\u000a<\/head>\u000d\u000a<body>\u000d\u000a\u000d\u000a<div style=\"font-size:24px;font-weight:bold\">\u000d\u000aText Headline: \u000d\u000a<\/div>\u000d\u000a\u000d\u000a<div style=\"float:left;width:50%;margin:0;padding:0;\">\u000d\u000a<p>Lorem ipsum dolor sit amet<\/p>\u000d\u000a\u000d\u000a<p>In nec imperdiet lectus. 当我使用decodeURI或decodeURIComponent时,我得到“要解码的URI不是有效的编码”错误。
有人能推荐一种方法或正则表达式来清理HTML吗?
发布于 2012-04-11 22:22:53
根据我的快速测试:
regex = /\\([^u])/g;
// put it in a JSON object so that JS doesn't automatically de-escape
string = JSON.stringify({response: '<!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD XHTML 1.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-transitional.dtd\">\u000d\u000a<html xmlns=\"http:\/\/www.w3.org\/1999\/xhtml\">\u000d\u000a<head>\u000d\u000a <title>...<\/title>\u000d\u000a <script src=\"\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/1.7.1\/jquery.min.js\" type=\"text\/javascript\"><\/script>\u000d\u000a<\/head>\u000d\u000a<body>\u000d\u000a\u000d\u000a<div style=\"font-size:24px;font-weight:bold\">\u000d\u000aText Headline: \u000d\u000a<\/div>\u000d\u000a\u000d\u000a<div style=\"float:left;width:50%;margin:0;padding:0;\">\u000d\u000a<p>Lorem ipsum dolor sit amet<\/p>\u000d\u000a\u000d\u000a<p>In nec imperdiet lectus.'});
string.replace(regex, '$1');将替换除Unicode之外的所有转义斜杠。我不认为在JS中有很多其他非转义斜杠的用法。
https://stackoverflow.com/questions/10098575
复制相似问题