我有这个(简化的)正则表达式:
((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))我在雷杰斯环境中创建了这个程序,并在下面这个句子中测试了这一点:
python and java love python love python and java java与之相匹配:
python and java love python love python和java
这正是我想要的。因此,我在python中实现了这一点:
import re
regex = re.compile("((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))")
string = "python and java love python love python and java java"
print(str(re.findall(regex,string)))然而,这将使:
[('python and java love', '', '', 'python and ', 'python and ', 'java love', 'love'), ('python love', '', '', '', '', 'python love', 'love')]是什么造成了这种差异,如何解决这个问题?
更新1
使用原始字符串也不能工作:
import re
regex = re.compile(r'((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))')
string = "python and java love python love python and java java"
print(str(re.findall(regex,string)))这仍将使:
[('python and java love', '', '', 'python and ', 'python and ', 'java love', 'love'), ('python love', '', '', '', '', 'python love', 'love')]更新2
我将使用我的其他正则表达式(其他术语),因为我可以确切地说出我想要匹配的内容和不匹配的内容:
"(?:\s(?:low|high)\s)?(?:\S+\s+and\s)?(\S+\s+stress|deficiency|limiting)"什么是应该匹配的:
low|high ANY_WORD stress|deficiency|limiting
ANY_WORD stress|deficiency|limiting
ANY_WORD and ANY_WORD stress|deficiency|limiting
ANY_WORD and ANY_WORD ANY_WORD stress|deficiency|limiting
(for the last one only allow two words after and if stress,deficiency or limiting is behind it什么是不应该匹配的:
stress|deficiency|limiting (so don't match these if nothing is in front of them)
low
high
ANY_WORD
ANY_WORD and ANY_WORD示例列表:
比赛:
salt and water stress
photo-oxidative stress
salinity and high light stress
low-temperature stress
Cd stress
Cu deficiency
N deficiency
IMI stress不匹配:
stress
deficiency
limiting
temperature and water
low
high
water and salt发布于 2017-05-08 14:23:10
regex有许多不必要的捕获组影响findall的输出。
您可以将正则表达式转换为此,并使其工作:
>>> regex = re.compile(r"(?:\s(?:low|high)\s)?(?:\S+\s+and\s)?\S+[ \t]+(?:stress|deficiency|limiting)")
>>> print re.findall(regex, string)顺便说一句,这也不需要原始字符串模式,不过建议您在regex中使用r"..."。
https://stackoverflow.com/questions/43849904
复制相似问题