文章/答案/技术大牛

发布

社区首页 >问答首页 >如何移除两个外部字符之间的所有内容？

问如何移除两个外部字符之间的所有内容？
EN

Stack Overflow用户

提问于 2014-03-03 10:48:57

回答 4查看 112关注 0票数 1

我有以下字符串的部件：

{{Infobox musical artist
|honorific-prefix  = [[The Honourable]]
| name = Bob Marley
| image = Bob-Marley.jpg
| alt = Black and white image of Bob Marley on stage with a guitar
| caption = Bob Marley in concert, 1980.
| background = solo_singer
| birth_name = Robert Nesta Marley
| alias = Tuff Gong
| birth_date = {{birth date|df=yes|1945|2|6}}
| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]
| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}
| death_place = [[Miami]], [[Florida]]
| instrument = Vocals, guitar, percussion
| genre = [[Reggae]], [[ska]], [[rocksteady]]
| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] 
| years_active = 1962–1981
| label = [[Beverley's]], [[Studio One (record label)|Studio One]],
| associated_acts = [[Bob Marley and the Wailers]]
| website = {{URL|bobmarley.com}}
}}

我想把所有的东西都去掉。现在，如果我尝试regex：\{\{(.*?)\}\}，它捕获了{{birth date|df=yes|1945|2|6}}，这是有意义的，所以我尝试了：\{\{([^\}]*?)\}\}，它从一开始就抓住了，但以相同的行结束，这也是有意义的，因为它已经封装了}}，我也尝试了没有?贪婪，仍然一样的结果。我的问题是，无论在中有多少相同的字符，我如何删除{{}}中的？

编辑：，如果你想要我的全部输入，它是：https://en.wikipedia.org/w/index.php?maxlag=5&title=Bob+Marley&action=raw

java

regex

wikipedia

回答 4

Stack Overflow用户

回答已采纳

发布于 2014-03-03 10:59:29

下面是一个DOTALL Pattern的解决方案，以及一个只包含要删除的片段的、一个实例(即用空的String替换)的输入的贪婪量化符：

String input = "Foo {{Infobox musical artist\n"
                + "|honorific-prefix  = [[The Honourable]]\n"
                + "| name = Bob Marley\n"
                + "| image = Bob-Marley.jpg\n"
                + "| alt = Black and white image of Bob Marley on stage with a guitar\n"
                + "| caption = Bob Marley in concert, 1980.\n"
                + "| background = solo_singer\n"
                + "| birth_name = Robert Nesta Marley\n"
                + "| alias = Tuff Gong\n"
                + "| birth_date = {{birth date|df=yes|1945|2|6}}\n"
                + "| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]\n"
                + "| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}\n"
                + "| death_place = [[Miami]], [[Florida]]\n"
                + "| instrument = Vocals, guitar, percussion\n"
                + "| genre = [[Reggae]], [[ska]], [[rocksteady]]\n"
                + "| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] \n"
                + "| years_active = 1962–1981\n"
                + "| label = [[Beverley's]], [[Studio One (record label)|Studio One]],\n"
                + "| associated_acts = [[Bob Marley and the Wailers]]\n"
                + "| website = {{URL|bobmarley.com}}\n" + "}} Bar";
//                                    |DOTALL flag
//                                    |  |first two curly brackets
//                                    |  |     |multi-line dot
//                                    |  |     | |last two curly brackets
//                                    |  |     | |        | replace with empty
System.out.println(input.replaceAll("(?s)\\{\\{.+\\}\\}", ""));

输出

Foo  Bar

注释后的注释

这种情况意味着使用正则表达式来操作标记语言。

正则表达式不是用来解析层次化标记实体的，在这种情况下也不起作用，所以这个答案只是一个存根，在这种情况下，最好是一个丑陋的解决方案。

有关使用regex解析标记的著名SO线程，请参见这里。

票数 1

Stack Overflow用户

发布于 2014-03-03 10:56:07

使用贪婪的量词代替你不愿意使用的量词。

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

编辑:汤匙喂养：“{{.*}”

票数 0

Stack Overflow用户

发布于 2014-03-03 11:14:04

尝试一下这个模式，它应该能处理好所有的事情：

"\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D"

指定：多特雷

代码：

String result = searchText.replaceAll("\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D", "");

示例：http://fiddle.re/5n4zg

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22144701

复制

相似问题

问如何移除两个外部字符之间的所有内容？
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何移除两个外部字符之间的所有内容？EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何移除两个外部字符之间的所有内容？
EN