文章/答案/技术大牛

发布

社区首页 >问答首页 >如何修复此RegEx模式，以便提取与此正则表达式匹配的字符串中所有可能出现的子字符串？

问如何修复此RegEx模式，以便提取与此正则表达式匹配的字符串中所有可能出现的子字符串？
EN

Stack Overflow用户

提问于 2022-08-19 08:10:42

回答 2查看 76关注 0票数 2

我使用这段代码的目标是只替换一个子字符串的出现，该子字符串的前面是一个特定的模式，然后是一个特定的模式(为了建立这个模式，我已经提出了使用RegEx)。

实际上，我已经尝试过很多种方法，但没有得到很好的结果，这里我使用compile()方法将输入字符串中的RegEx模式编译成regex模式对象(基本上，我逐个提取出符合RegEx模式条件的子字符串出现的情况)。

然后，我可以简单地使用replace()函数，原谅冗余，用我想要的子字符串替换提取的子字符串。

import re

input_text = "y creo que hay 55 y 6 casas, y quizas alguna mas... yo creo que empezaria entre la 1 ,y las 27"

#the string with which I will replace the desired substrings in the original input string
content_fix = " "

##This is the regex pattern that tries to establish the condition in which the substring should be replaced by the other
#pat = re.compile(r"\b(?:previous string)\s*string that i need\s*(?:string below)?", flags=re.I, )
#pat = re.compile(r"\d\s*(?:y)\s*\d", flags=re.I, )
pat = re.compile(r"\d\s*(?:, y |,y |y )\s*(?:las \d|la \d|\d)", flags=re.I, )

x = pat.findall(input_text)
print(*map(str.strip, x), sep="\n") #it will print the substrings, which it will try to replace in the for cycle
content_list = []
content_list.append(list(map(str.strip, x)))
for content in content_list[0]:
    input_text = input_text.replace(content, content_fix) # "\d y \d"  ---> "\d \d"

print(repr(input_text))

这是我得到的输出：

'y creo que hay 5  casas, y quizas alguna mas... yo creo que empezaria entre la  7'

这是，正确的输出，我需要

'y creo que hay 55 6 casas, y quizas alguna mas... yo creo que empezaria entre la 1 27'

我应该对我的RegEx做什么修改，以便它提取正确的子字符串并符合这段代码的目标？

regexp-replace

python

python-3.x

regex

string

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-08-19 18:04:56

input_text = "y creo que hay 55 y 6 casas, y quizas alguna mas... \
yo creo que empezaria entre la 1 ,y las 27"



re.sub(r'((\d+\s+)y\s+(\d+))| ((\d+\s+),y\s+\w{3}\s+(\d+))', r'\2\3 \5\6', input_text)


y creo que hay 55 6  casas, y quizas alguna mas... yo creo que empezaria entre la 1 27

票数 2

Stack Overflow用户

发布于 2022-08-19 12:46:02

我想出了一些东西，这是我所能得到的最好结果:)。你可能会找到一个改进它的方法。导入re

input_text = "y creo que hay 55 y 6 casas, y quizas alguna mas... yo creo que empezaria entre la 1 ,y las 27"

print(re.sub(r"(?<=\d).+?(?=\d)", " ", input_text))

输出将如下所示：

也许你会找到一个改善表达的方法，或者有人会..。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73413692

复制

相似问题

问如何修复此RegEx模式，以便提取与此正则表达式匹配的字符串中所有可能出现的子字符串？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何修复此RegEx模式，以便提取与此正则表达式匹配的字符串中所有可能出现的子字符串？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何修复此RegEx模式，以便提取与此正则表达式匹配的字符串中所有可能出现的子字符串？
EN