文章/答案/技术大牛

发布

社区首页 >问答首页 >Regex: python其他结果，如regexr

问Regex: python其他结果，如regexr
EN

Stack Overflow用户

提问于 2017-05-08 14:11:06

回答 1查看 91关注 0票数 1

我有这个(简化的)正则表达式：

((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))

我在雷杰斯环境中创建了这个程序，并在下面这个句子中测试了这一点：

python and java love python love python and java java

与之相匹配：

python and java love python love python和java

这正是我想要的。因此，我在python中实现了这一点：

import re
regex = re.compile("((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))")
string = "python and java love python love python and java java"
print(str(re.findall(regex,string)))

然而，这将使：

[('python and java love', '', '', 'python and ', 'python and ', 'java love', 'love'), ('python love', '', '', '', '', 'python love', 'love')]

是什么造成了这种差异，如何解决这个问题？

更新1

使用原始字符串也不能工作：

import re
regex = re.compile(r'((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))')
string = "python and java love python love python and java java"
print(str(re.findall(regex,string)))

这仍将使：

[('python and java love', '', '', 'python and ', 'python and ', 'java love', 'love'), ('python love', '', '', '', '', 'python love', 'love')]

更新2

我将使用我的其他正则表达式(其他术语)，因为我可以确切地说出我想要匹配的内容和不匹配的内容：

"(?:\s(?:low|high)\s)?(?:\S+\s+and\s)?(\S+\s+stress|deficiency|limiting)"

什么是应该匹配的：

low|high ANY_WORD stress|deficiency|limiting
ANY_WORD stress|deficiency|limiting
ANY_WORD and ANY_WORD stress|deficiency|limiting
ANY_WORD and ANY_WORD ANY_WORD stress|deficiency|limiting
(for the last one only allow two words after and if stress,deficiency or limiting is behind it

什么是不应该匹配的：

stress|deficiency|limiting (so don't match these if nothing is in front of them)
    low
    high
    ANY_WORD
    ANY_WORD and ANY_WORD

示例列表：

比赛：

salt and water stress
photo-oxidative stress
salinity and high light stress
low-temperature stress
Cd stress
Cu deficiency
N deficiency
IMI stress

不匹配：

stress
deficiency
limiting
temperature and water
low
high
water and salt

python

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-05-08 14:23:10

regex有许多不必要的捕获组影响findall的输出。

您可以将正则表达式转换为此，并使其工作：

>>> regex = re.compile(r"(?:\s(?:low|high)\s)?(?:\S+\s+and\s)?\S+[ \t]+(?:stress|deficiency|limiting)")
>>> print re.findall(regex, string)

顺便说一句，这也不需要原始字符串模式，不过建议您在regex中使用r"..."。

RegEx演示

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43849904

复制

相似问题

问Regex: python其他结果，如regexr
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Regex: python其他结果，如regexrEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Regex: python其他结果，如regexr
EN