blocks|key|1224267|text|Match+m+=+Regex.Match(strHTMLSource,+"%5E.*?</h[123]>.*?(.*?)",
++++RegexOptions.Compiled+%7C+RegexOptions.IgnoreCase);

string+para+=+m.Success+?+m.Groups[1].Value.Trim()+:+string.Empty;|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|1224268|unstyled|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|G|8|@]|9|@]|A|$B|C]]|$1|D|3|-4|5|E|7|H|8|@]|9|@]|A|$]]]|F|$]]

<pre><code>Match m = Regex.Match(strHTMLSource, "^.*?&lt;/h[123]&gt;.*?&lt;p&gt;(.*?)&lt;/p&gt;",
 RegexOptions.Compiled | RegexOptions.IgnoreCase);

string para = m.Success ? m.Groups[1].Value.Trim() : string.Empty;
</code></pre>

blocks|key|3934805|text|就我个人而言，我会使用XPath查询来实现您正在尝试实现的目标，这比摆弄正则表达式要容易得多。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|3934806|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

Personally I would use XPath queries to do what you're trying to achieve, much easier imo than fiddling with regexes.

blocks|key|4778080|text|此正则表达式将查找h1、h2或h3之后的所有第一个段落。如果你只想要页面上的第一段，只需要保持第一段匹配。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4778081|(?<=</h[1-3]>\s*?)([\s\S]*?)(?=)|code-block|syntax|javascript|4778082|您可能需要调整标记的匹配项以考虑属性。|offset|length|style|CODE|4778083|entityMap^0|0|0|7|3|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@$I|R|J|S|K|L]]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

This regex will find all first paragraphs after a h1, h2, or h3. If you want only the very first paragraph on the page, just keep the first match.

<pre><code>(?&lt;=&lt;/h[1-3]&gt;\s*?&lt;p&gt;)([\s\S]*?)(?=&lt;/p&gt;)
</code></pre>

You will probably need to adjust the matches for the <code>&lt;p&gt;</code> tags to account for attributes.

blocks|key|4778155|text|在很多情况下，正则表达式不能正常工作。例如：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4778156|foobarbaz

This+paragraph+is+valid++|code-block|syntax|javascript|4778157|捕获和之间的文本的正则表达式将分别捕获：|offset|length|style|CODE|4778158|foobar

This+paragraph+is+valid+<!--+This+one+isn't|4778159|如果必须处理在野外找到的HTML，我会使用MSHTML来解析HTML，然后在DOM中搜索以找到对象。|4778160|可以肯定的是，使用MSHTML远不如使用正则表达式轻量级。但MSHTML的设计初衷是为了让最马虎的网页变得有意义。我更愿意使用它设计来处理的混乱的现实世界用例的所有知识，而不是自己去发现它们。|4778161|有关示例代码，请参阅this问题的答案。|4778162|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c^0|0|0|2|3|6|4|0|0|0|0|A|4|0|0^^$0|@$1|2|3|4|5|6|7|12|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|13|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|14|8|@$I|15|J|16|K|L]|$I|17|J|18|K|L]]|9|@]|A|$]]|$1|M|3|N|5|D|7|19|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|1A|8|@]|9|@]|A|$]]|$1|Q|3|R|5|6|7|1B|8|@]|9|@]|A|$]]|$1|S|3|T|5|6|7|1C|8|@]|9|@$I|1D|J|1E|1|1F]]|A|$]]|$1|U|3|-4|5|6|7|1G|8|@]|9|@]|A|$]]]|V|$W|$5|X|Y|Z|A|$10|11]]]]

There are a lot of use cases that a regular expression won't work properly for. For instance:

<pre><code>&lt;p&gt;foo&lt;p&gt;bar&lt;/p&gt;baz&lt;/p&gt;

&lt;p&gt;This paragraph is valid &lt;!-- &lt;p&gt;This one isn't&lt;/p&gt; --&gt; &lt;/p&gt;
</code></pre>

A regular expression that captures the text between the <code>&lt;p&gt;</code> and <code>&lt;/p&gt;</code> will capture (respectively):

<pre><code>foo&lt;p&gt;bar

This paragraph is valid &lt;!-- &lt;p&gt;This one isn't
</code></pre>

If I had to process HTML found in the wild, I'd use MSHTML to parse the HTML, and then search through the DOM to find the objects. 

Using MSHTML is not anywhere near as lightweight as using a regular expression, to be sure. But MSHTML is designed to make sense out of the sloppiest of web pages. I'd much rather use all of the knowledge of messy real-world use cases that it's designed to handle than discover them allfor myself.

See the answer to <a href="https://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c">this</a> question for a bit of sample code.

I'm nearly done with a trackback system for my website, but have one last niggling regular expression I just can't get right.

What I'm after is an excerpt of the referring page, where I'm defining the most relevant excerpt as:

The first paragraph (marked by <code>&lt;p&gt;&lt;/p&gt;</code> tags) that follows either an <code>&lt;h1&gt;&lt;/h1&gt;</code>, <code>&lt;h2&gt;&lt;/h2&gt;</code> or <code>&lt;h3&gt;&lt;/h3&gt;</code> in the HTML Source of the page.

For instance, I can successfully fetch the <code>&lt;title&gt;&lt;/title&gt;</code> tag for the HTML as follows:

<pre><code>Regex reTITLE = new Regex( @"(?&lt;=&lt;title.*&gt;)([\s\S]*)(?=&lt;/title&gt;)",
RegexOptions.IgnoreCase );

Match match = reTITLE.Match( strHTMLSource );
if (match.Success)
 {
 strReferringPageTitle = match.Value.Trim( );
 }
</code></pre>

My question -- what Regular Expression can I use to fetch the string described in the first part of my post? 

PS: I love StackOverflow and this community -- great job, Joel &amp; Co.!

Regular Expression (C# flavor) to fetch first after heading tag

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

EdgeOne AI 安全实战专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我几乎完成了我的网站的trackback系统，但还有最后一个琐碎的正则表达式，我就是搞不懂。我想要的是引用页面的摘录，在那里我将最相关的摘录定义为：页面的HTML源中<h1></h1>、<h2></h2>或<h3></h3>后面的第一段(由标记标记)。例如，我可以成功获取HTML的<title></title>标记，如下所示：Regex reTITLE = new Regex( @"

问在标题标记之后首先提取<p></p>的正则表达式(C#风格)
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在标题标记之后首先提取<p></p>的正则表达式(C#风格)EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在标题标记之后首先提取<p></p>的正则表达式(C#风格)
EN