文章/答案/技术大牛

发布

社区首页 >问答首页 >python regex，替换字符串中的模式

问python regex，替换字符串中的模式
EN

Stack Overflow用户

提问于 2012-12-05 22:39:40

回答 3查看 547关注 0票数 0

我想用wiki标记替换字符串中的一些子字符串。例如，我有一根线

some other string before
; Methods
{{columns-list|3|
* [[Anomaly detection|Anomaly/outlier/change detection]]
* [[Association rule learning]]
* [[Statistical classification|Classification]]
* [[Cluster analysis]]
* [[Decision trees]]
* [[Factor analysis]]
* [[Neural Networks]]
* [[Regression analysis]]
* [[Structured data analysis (statistics)|Structured data analysis]]
* [[Sequence mining]]
* [[Text mining]]
}}

; Application domains
{{columns-list|3|
* [[Analytics]]
* [[Bioinformatics]]
* [[Business intelligence]]
* [[Data analysis]]
* [[Data warehouse]]
* [[Decision support system]]
* [[Drug Discovery]]
* [[Exploratory data analysis]]
* [[Predictive analytics]]
* [[Web mining]]
}}
some other string after

我希望将原始子字符串替换为

[[Anomaly detection|Anomaly/outlier/change detection]]
[[Association rule learning]]
[[Statistical classification|Classification]]
[[Cluster analysis]]
[[Decision trees]]
[[Factor analysis]]
[[Neural Networks]]
[[Regression analysis]]
[[Structured data analysis (statistics)|Structured data analysis]]
[[Sequence mining]]
[[Text mining]]
[[Analytics]]
[[Bioinformatics]]
[[Business intelligence]]
[[Data analysis]]
[[Data warehouse]]
[[Decision support system]]
[[Drug Discovery]]
[[Exploratory data analysis]]
[[Predictive analytics]]
[[Web mining]]

我先尝试了一些正则表达式来提取{{ }}中的内容。但我总是一个都没有。

添加:问题是我只对[[]]中的内容感兴趣，而[[]]本身在{{}}中。我在字符串的其他部分中还出现了一些其他的[[]]。

那么，我该如何使用re.sub来实现这一点呢？谢谢

添加:当前解决方案(丑陋)

def regt(matchobj):
  #store matchobj.group(0) somewhere else, later on add them to the string
  #Next, another function will remove all {{}} alway
  return ''

matches = re.sub(r'\[\[.*?\]\](?=[^{]*\}\})', regt,wiki_string2)

python

regex

回答 3

Stack Overflow用户

回答已采纳

发布于 2012-12-05 22:46:02

Match it而不是replacing it

\[\[.*?\]\](?=[^{]*\}\})

.*?与lazily.so匹配，它将在第一次发生]]时停止

.*与greedily.so匹配，它将在上次发生]]时停止

(?=[^{]*}})是一个lookahead，这意味着只有当它后面跟0到许多字符( {到}}除外)时，才能匹配[[ ]]中的内容。

这样做是因为如果[[``]]在{{ }}中，则需要与之匹配。

因此，]]之后的字符可以是除{到}}之外的任何字符。

这样就可以避免像这样的情况

[[xyz]]<-this would not match since { after it
{{
[[xyz]]<-this would match since it is not followed by { and it reaches }}
[[xyz]]<-this would match since it is not followed by { and it reaches }}
}}

票数 0

Stack Overflow用户

发布于 2012-12-05 22:43:27

尝试使用非贪婪正则表达式，如下所示: r"{{.*}}“

票数 0

Stack Overflow用户

发布于 2012-12-05 22:52:05

您可以尝试以下操作：

In [10]: p = "\[\[.*?\]\]"
In [11]: s1 = '\n'.join(re.findall(p, s))

使用附加约束(仅{{}}个匹配项中的文本)更新您可以通过两个步骤实现您的目标：

选择花括号内的文本
，然后选择

方块内的文本

你可以这样做(我使用了一个源字符串，其中包含不匹配的正方形的文本)：

In [157]: print s
some [[other string before]]
Methods("")
{{columns-list|3|
* [[Cluster analysis]]
* [[Decision trees]]
* [[Factor analysis]]
}}
Application("domains")
{{columns-list|3|
* [[Analytics]]
* [[Bioinformatics]]
* [[Web mining]]
}}
some [[other string after]]

In [158]: p = "(?:\{\{)[\s\S]*?(?:\}\})"

In [159]: s1 = '\n'.join(re.findall(p, s))

In [160]: print s1
{{columns-list|3|
* [[Cluster analysis]]
* [[Decision trees]]
* [[Factor analysis]]
}}
{{columns-list|3|
* [[Analytics]]
* [[Bioinformatics]]
* [[Web mining]]
}}

In [161]: p1 = "\[\[.*\]\]"

In [162]: s2 = '\n'.join(re.findall(p1, s1))

In [163]: print s2
[[Cluster analysis]]
[[Decision trees]]
[[Factor analysis]]
[[Analytics]]
[[Bioinformatics]]
[[Web mining]]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/13725634

复制

相似问题

问python regex，替换字符串中的模式
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python regex，替换字符串中的模式EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python regex，替换字符串中的模式
EN