blocks|key|1645982|text|这个regex会做你想做的事：<(BR%7Cbr)[%5E>]*>|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1645983|下面是一个有用的示例：Regex101|1645984|entityMap|0|LINK|mutability|MUTABLE|url|https://regex101.com/r/47n1lY/1^0|F|E|0|B|8|0|0^^$0|@$1|2|3|4|5|6|7|P|8|@$9|Q|A|R|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|S|8|@]|D|@$9|T|A|U|1|V]]|E|$]]|$1|H|3|-4|5|6|7|W|8|@]|D|@]|E|$]]]|I|$J|$5|K|L|M|E|$N|O]]]]

This regex will do what you want: <code>&lt;(BR|br)[^&gt;]*&gt;</code>

Here is a working example: <a href="https://regex101.com/r/47n1lY/1" rel="nofollow noreferrer">Regex101</a>

blocks|key|1605599|text|您可能希望<br\b[%5E>]*>匹配以下所有标记|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1605600|从<br开始|unordered-list-item|1605601|在<br之后有一个单词中断(例如，您将不匹配一个<brown>标记)|1605602|包含任意数量的非>字符，包括0。|1605603|以>结束|1605604|entityMap^0|5|B|0|1|3|0|1|3|O|7|0|8|1|0|1|1|0^^$0|@$1|2|3|4|5|6|7|Q|8|@$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@$9|U|A|V|B|C]]|D|@]|E|$]]|$1|I|3|J|5|H|7|W|8|@$9|X|A|Y|B|C]|$9|Z|A|10|B|C]]|D|@]|E|$]]|$1|K|3|L|5|H|7|11|8|@$9|12|A|13|B|C]]|D|@]|E|$]]|$1|M|3|N|5|H|7|14|8|@$9|15|A|16|B|C]]|D|@]|E|$]]|$1|O|3|-4|5|6|7|17|8|@]|D|@]|E|$]]]|P|$]]

You probably want <code>&lt;br\b[^&gt;]*&gt;</code> to match all tags that

<ul>
<li>Start with <code>&lt;br</code></li>
<li>Have a word-break after the <code>&lt;br</code> (so you wouldn't match a <code>&lt;brown&gt;</code> tag, for example</li>
<li>Contain any number of non-<code>&gt;</code> characters, including 0</li>
<li>End with a <code>&gt;</code></li>
</ul>

blocks|key|1645970|text|您必须使用.*而不是*：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1645971|htmlString.replaceAll("(?i)<br+.*>",+" ")
//-----------------------------%5E%5E|code-block|syntax|javascript|1645972|因为：|1645973|*匹配前面的字符或子表达式0次或多次。|blockquote|1645974|和|1645975|.*匹配任意字符0或多次。|1645976|所以对于你的案子：|1645977|String+htmlString+=+"<BR+style=\"PAGE-BREAK-BEFORE:+always\"+clear=all>";
System.out.println(htmlString.replaceAll("(?i)<br+.*>",+" "));|1645978|输出|BOLD|1645979| |1645980|entityMap^0|5|2|A|1|0|0|0|0|1|0|0|0|2|0|0|0|0|2|0|0^^$0|@$1|2|3|4|5|6|7|14|8|@$9|15|A|16|B|C]|$9|17|A|18|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|19|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|1A|8|@]|D|@]|E|$]]|$1|M|3|N|5|O|7|1B|8|@$9|1C|A|1D|B|C]]|D|@]|E|$]]|$1|P|3|Q|5|6|7|1E|8|@]|D|@]|E|$]]|$1|R|3|S|5|O|7|1F|8|@$9|1G|A|1H|B|C]]|D|@]|E|$]]|$1|T|3|U|5|6|7|1I|8|@]|D|@]|E|$]]|$1|V|3|W|5|H|7|1J|8|@]|D|@]|E|$I|J]]|$1|X|3|Y|5|6|7|1K|8|@$9|1L|A|1M|B|Z]]|D|@]|E|$]]|$1|10|3|11|5|H|7|1N|8|@]|D|@]|E|$I|J]]|$1|12|3|-4|5|6|7|1O|8|@]|D|@]|E|$]]]|13|$]]

You have to use <code>.*</code> instead of <code>*</code> :

<pre><code>htmlString.replaceAll("(?i)&lt;br .*&gt;", "&lt;br/&gt;")
//-----------------------------^^
</code></pre>

because : 

<blockquote>
 <code>*</code> Match the preceding character or subexpression 0 or more times.
</blockquote>

and 

<blockquote>
 <code>.*</code> Matches any character zero or many times
</blockquote>

So for your case :

<pre><code>String htmlString = "&lt;BR style=\"PAGE-BREAK-BEFORE: always\" clear=all&gt;";
System.out.println(htmlString.replaceAll("(?i)&lt;br .*&gt;", "&lt;br/&gt;"));
</code></pre>

Output

<pre><code>&lt;br/&gt;
</code></pre>

blocks|key|311927|text|使用正则表达式解析HTML并不是一个好主意，因为HTML不是规则的。您应该使用像NekoHTML这样的正确的解析库。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|311928|NekoHTML是一个简单的HTML扫描器和标记均衡器，它使应用程序程序员能够解析HTML文档并使用标准的XML接口访问信息。解析器可以扫描HTML文件并“修复”人类(和计算机)作者在编写HTML文档时所犯的许多常见错误。NekoHTML添加缺少的父元素；自动关闭带有可选结束标记的元素；并可以处理不匹配的内联元素标记。|blockquote|311929|entityMap|0|LINK|mutability|MUTABLE|url|http://nekohtml.sourceforge.net/^0|14|8|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@$A|P|B|Q|1|R]]|C|$]]|$1|D|3|E|5|F|7|S|8|@]|9|@]|C|$]]|$1|G|3|-4|5|6|7|T|8|@]|9|@]|C|$]]]|H|$I|$5|J|K|L|C|$M|N]]]]

Using regular expressions to parse HTML is not a good idea because HTML is not regular. You should use a proper parsing library like <a href="http://nekohtml.sourceforge.net/" rel="nofollow noreferrer">NekoHTML</a>.

<blockquote>
 NekoHTML is a simple HTML scanner and tag balancer that enables
 application programmers to parse HTML documents and access the
 information using standard XML interfaces. The parser can scan HTML
 files and "fix up" many common mistakes that human (and computer)
 authors make in writing HTML documents. NekoHTML adds missing parent
 elements; automatically closes elements with optional end tags; and
 can handle mismatched inline element tags.
</blockquote>

I am attempting to convert a bunch of HTML documents to XML compliance (via a java method) and there are a lot of <code>&lt;br&gt;</code> tags that either (1) are unclosed or (2) contain attributes. For some reason the regex I'm using does not address the tags that contain attributes. Here is the code:

<pre><code>htmlString = htmlString.replaceAll("(?i)&lt;br *&gt;", "&lt;br/&gt;");
</code></pre>

This code works fine for all the <code>&lt;br&gt;</code> tags in the documents; it replaces them with <code>&lt;br/&gt;</code>. However, for tags like

<pre><code>&lt;BR style="PAGE-BREAK-BEFORE: always" clear=all&gt;
</code></pre>

it doesn't do anything. I'd like all br tags to just be <code>&lt;br/&gt;</code>, regardless of any attributes in the tag prior to conversion. 

What do I need to add to my regex in order to achieve this?

Trying to replace , , tags with

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

EdgeOne AI 安全实战专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我正在尝试将一堆HTML文档转换成符合XML的标准(通过java方法)，并且有许多 标记，它们要么是未关闭的，要么(2)包含属性。由于某些原因，我所使用的regex不寻址包含属性的标记。以下是代码：htmlString = htmlString.replaceAll("(?i) ", " ");这段代码对于文档中的所有 标记都很好；它用 替换它们。但是，对于这

问尝试用<br/>替换<br>、<BR>、<br +attribute>标记
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试用<br/>替换<br>、<BR>、<br +attribute>标记EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试用<br/>替换<br>、<BR>、<br +attribute>标记
EN