我有一个非常长的字符串,它是由几个HTML文档像这样挤在一起的:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
some head info
</head>
<body>
<div > some content with other HTML tags that I want to preserve </div>
<body>
</html>
<html>
<div> another content with other HTML tags that I want to preserve </div>
</html>
<html xmlns="http://www.w3.org/TR/REC-html40">
<head>
some head info
</head>
<body>
<div> some other content with other HTML tags that I want to preserve </div>
<body>
</html>我想把它们变成这样:
<div > some content with other HTML tags that I want to preserve </div>
<div> another content with other HTML tags that I want to preserve </div>
<div> some other content with other HTML tags that I want to preserve </div>基本上,我正在寻找一个正则表达式来从一个巨大的html字符串中删除<html> </html>标记(而不是其他/内部html元素)。请注意,我应该保留html内容并去掉父标签。
提前感谢
(请注意,我已经做了大量的搜索,以确保这不是一个重复的问题)
发布于 2020-05-06 10:49:22
作为一个重要的说明:https://stackoverflow.com/a/1732454/3498950
但如果您一定要这样做,我可能会使用/<\/?html.*?>/g之类的东西
const html = `<html xmlns:v="urn:schemas-microsoft-com:vml">
<head>head info</head>
<div>other content</div>
</html>`;
console.log(html.replace(/<\/?html.*?>/g, '').trim());
调整正则表达式:https://regex101.com/r/EeTv68/1
https://stackoverflow.com/questions/61626450
复制相似问题