文章/答案/技术大牛

发布

社区首页 >问答首页 >正则表达式不会在html标记之间返回字符串

问正则表达式不会在html标记之间返回字符串
EN

Stack Overflow用户

提问于 2011-12-09 11:07:12

回答 3查看 107关注 0票数 0

我一直在努力解决这个问题:我试图提取包含在底部代码的Loaction块中的文本。我希望提取以下内容：

<h3 class="blue">Location</h3><p class="desc">This elegant luxurious hotel is located in the middle of stunning greenery on a hill, overlooking the sand/ pebble beach of Ixia, which is accessed just over the promenade (around 200 m away). The glamorous building, which is based on architecture from the Middle Ages is stylish and designed in classical, elegant decor. The island's capital of Rhodes Town is located around 4 km from the hotel and Rhodes' airport is roughly 9 km away whilst public transport departs from a stop located just 200 m away.</p>

使用

<h3 class="blue">Location<\/h3><p\s(.*)\s.<\/p>

但它不会起作用。有人能帮帮忙吗。问候

 ...In addition, there is also playground for younger guests in the hotel grounds.</p><h3 class="blue">Location</h3><p class="desc">This elegant luxurious hotel is located in the middle of stunning greenery on a hill, overlooking the sand/ pebble beach of Ixia, which is accessed just over the promenade (around 200 m away). The glamorous building, which is based on architecture from the Middle Ages is stylish and designed in classical, elegant decor. The island's capital of Rhodes Town is located around 4 km from the hotel and Rhodes' airport is roughly 9 km away whilst public transport departs from a stop located just 200 m away.</p><h3 class="blue">Rooms</h3><p class="desc">The comfortable rooms include an en suite bathroom with hairdryer, bathrobe, slippers, a direct dial telephone, satellite/ cable TV, a minibar, air conditioning (centrally regulated), a hire safe as well as a terrace or balcony.</p><h3 class="blue">Sports</h3><p class="desc">In the outdoor complex are 2 swimming pools with children's pools, a...

html

regex

回答 3

Stack Overflow用户

回答已采纳

发布于 2011-12-09 11:21:57

如果您选择的语言具有解析HTML的库，则应该使用它。正则表达式并不总是最好的工具，但是如果您熟悉输入，就有可能实现它。

也就是说，您的模式是贪婪的，因此它将匹配第一个结束段落标记之外的内容。要使它不贪婪，您需要使用.*? (注意添加了?)。

此外，通常不需要转义正斜杠(但根据您的历史记录，我猜您使用的是PHP )，并且使用\s.会导致匹配失败，因为文本不会以空格和字符结尾。.是元字符，可以匹配任何字符。如果你想匹配一个句点，你需要对它进行转义以使其成为文字，就像在\.中一样。

我更喜欢使用\b来表示单词边界，而不是在p标记之后使用\s。最后，除非您想捕获段落文本，否则不需要使用捕获组(.*?)。解决所有这些问题会给你留下这样的结果：

<h3 class=\"blue\">Location<\/h3><p\b.*?<\/p>

如果您想要捕获段落文本，可以采用以下方法：

<h3 class=\"blue\">Location<\/h3><p[^>]*>(.*?)<\/p>

[^>]*匹配任何不大于符号的字符，零次或多次。注意，模式的这一部分的好处是，它也是非贪婪的，因为只要内部段落内容

的大于号符号匹配大于symbol

(.*?)捕获组的文字，匹配就会停止

票数 2

Stack Overflow用户

发布于 2011-12-09 11:15:12

正则表达式的末尾有\s.<\/p>。你的段落末尾有ay.</p>。\s匹配一个空格字符，但是您的输入中有一个y，匹配失败。

票数 0

Stack Overflow用户

发布于 2011-12-09 11:15:41

只需删除第一组之后的\s即可。字符串中任何点的前面都没有空格。

<h3 class="blue">Location<\/h3><p\s(.*).<\/p>

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/8440556

复制

相似问题

问正则表达式不会在html标记之间返回字符串
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问正则表达式不会在html标记之间返回字符串EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问正则表达式不会在html标记之间返回字符串
EN