文章/答案/技术大牛

发布

社区首页 >问答首页 >wikitext模板上的Python regex

问wikitext模板上的Python regex
EN

Stack Overflow用户

提问于 2013-12-26 12:33:17

回答 1查看 127关注 0票数 0

我正在尝试从表单的wikitext模板中删除用Python中断的行：

{{cite web
|title=Testing
|url=Testing
|editor=Testing
}}

在re.sub中应该获得以下内容：

{{cite web|title=Testing|url=Testing|editor=Testing}}

我尝试Python regex已经有几个小时了，但还没有成功。例如，我试过：

while(re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}')):
     textmodif=re.sub(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', r'{cite web\1\3}}', textmodif,re.DOTALL)

但是它并不像预期的那样工作(即使没有while循环，它也不适用于第一行中断)。

我发现了类似的问题，但没有帮助：Regex for MediaWiki wikitext templates。我是Python的新手，所以请不要对我太苛刻:-)

提前谢谢你。

wikitext

python

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-12-26 12:37:58

您需要为.切换换行符；否则它不匹配换行符：

re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', inputtext, flags=re.DOTALL)

要匹配的文本中有多个换行符，因此仅匹配一组连续的换行符是不够的。

来自 documentation

使'.'特殊字符完全匹配任何字符，包括换行符；如果没有此标志，'.'将匹配除换行符以外的任何内容。

您可以使用一个re.sub()调用一次删除cite节中的所有换行符，而无需循环：

re.sub(r'\{cite web.*?[\r\n]+.*?\}\}', lambda m: re.sub('\s*[\r\n]\s*', '', m.group(0)), inputtext, flags=re.DOTALL)

这使用嵌套的正则表达式从匹配的文本中删除至少有一个换行符的所有空格。

演示：

>>> import re
>>> inputtext = '''\
... {{cite web
... |title=Testing
... |url=Testing
... |editor=Testing
... }}
... '''
>>> re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', inputtext, flags=re.DOTALL)
<_sre.SRE_Match object at 0x10f335458>
>>> re.sub(r'\{cite web.*?[\r\n]+.*?\}\}', lambda m: re.sub('\s*[\r\n]\s*', '', m.group(0)), inputtext, flags=re.DOTALL)
'{{cite web|title=Testing|url=Testing|editor=Testing}}\n'

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/20784920

复制

相似问题

问wikitext模板上的Python regex
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问wikitext模板上的Python regexEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问wikitext模板上的Python regex
EN