文章/答案/技术大牛

发布

社区首页 >问答首页 >匹配字符串中的所有标识符。

问匹配字符串中的所有标识符。
EN

Stack Overflow用户

提问于 2019-11-15 15:41:07

回答 3查看 373关注 0票数 3

问题：

我正在寻找一种方法，以匹配特定的标识符在一个给定的行，以特定的词开始。ID由字符组成，可能后面跟着数字，后面跟着一个破折号，然后是更多的数字。只有在起始词为下列之一的行上才应匹配ID :Close、Fixes、Resolves。如果一行包含多个If，则这些If将由字符串and分隔。任何数目的ID都可以出现在一行中。

示例测试字符串：

Closes PD-1                                           # Match: PD-1

Related to PD-2                                       # No match, line doesn't start with an allowed word

Closes                                                
NPD-1                                                 # No match, as the identifier is in a new line

Fixes PD-21 and PD-22                                 # Match: PD-21, PD-22

Closes PD-31, also PD-32 and PD-33                    # Match: PD-31 - the rest is not captured because of ", also"
Resolves PD4-41 and PD4-42 and PD4-43 and PD4-44      # Match: PD4-41, PD4-42, PD4-43, PD4-44

Resolves something related to N-2                     # No match, the identifier is not directly after 'Resolves'

我尝试了什么：

用正则表达式来得到所有的比赛，在某些方面我总是表现得很差。例如，我尝试的一个regexp是这样的：

^(?:Closes|Fixes|Resolves) (\w+-\d+)(?:(?: and )(\w+-\d+))*

我打算有一个非捕获组，其中行需要以一个允许的单词开头，然后是一个空格：^(?:Closes|Fixes|Resolves)。
然后，至少需要一个ID跟随起始单词，我打算捕获这个单词：(\w+-\d+)
最后，零或多个ID可以跟随第一个ID，第一个ID由字符串and分隔，但我只想在这里捕获ID，而不是分隔符：(?:(?: and )(\w+-\d+))*。

这个regexp在python中的结果：

test_string = """
Closes PD-1                                           # Match: PD-1
Related to PD-2                                       # No match, line doesn't start with an allowed word
Closes                                                
NPD-1                                                 # No match, as the identifier is in a new line
Fixes PD-21 and PD-22                                 # Match: PD-21, PD-22
Closes PD-31, also PD-32 and PD-33                    # Match: PD-31 - the rest is not captured because of ", also"
Resolves PD4-41 and PD4-42 and PD4-43 and PD4-44      # Match: PD4-41, PD4-42, PD4-43, PD4-44
Resolves something related to N-2                     # No match, the identifier is not directly after 'Resolves'
"""

ids = []

for match in re.findall("^(?:Closes|Fixes|Resolves) (\w+-\d+)(?:(?: and )(\w+-\d+))*", test_string, re.M):
    for group in match:
        if group:
            ids.append(group)

print(ids)
['PD-1', 'PD-21', 'PD-22', 'PD-31', 'PD4-41', 'PD4-44']

此外，下面是regex101.com上的结果。如果在初始ID后面有多个ID，不幸的是，它只捕获了最后一个ID，而不是所有的ID。我读到重复捕获组只会捕获最后一次迭代，我应该在重复组周围放置一个捕获组来捕获所有迭代，但是我无法使它工作。

摘要

是否有一个正则表达式的解决方案，类似于我尝试过的，但捕获了所有ID出现的情况？或者是否有更好的方法使用Python解析is的字符串？

python

regex

python-3.x

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-11-15 16:10:20

您可以使用单个捕获组，并且在该捕获组中匹配第一次出现，并重复相同的模式-- 0+时间，前面是空格，后面是and和空格。

数值在第1组中。

要获得单独的值，请在and上拆分

^(?:Closes|Fixes|Resolves) (\w+-\d+(?: and \w+-\d+)*)

Regex演示

票数 2

Stack Overflow用户

发布于 2019-11-15 16:34:44

使用两阶段方法可能更容易一些，例如：

def get_matches(test):  #assume test is a list of strings
    regex1 = re.compile(r'^(?:Closes|Fixes|Resolves) \w+-\d+')
    regex2 = re.compile(r'\w+-\d+')
    results = []
    for line in test:
        if regex1.search(line):
            results.extend(regex2.findall(line))
    return results

给予：

['PD-1','PD-21','PD-22','PD-31','PD-32', 
'PD-33','PD4-41','PD4-42','PD4-43','PD4-44']

票数 1

Stack Overflow用户

发布于 2019-11-15 17:46:01

如果需要使用重复捕获组，则应安装带有PyPi regex模块的pip install regex，并使用

import regex

test_string = "your string here"
ids = []
for match in regex.finditer("^(?:Closes|Fixes|Resolves) (?P<id>\w+-\d+)(?:(?: and )(?P<id>\w+-\d+))*", test_string, regex.M):
    ids.extend(match.captures("id"))
print(ids)
# => ['PD-1', 'PD-21', 'PD-22', 'PD-31', 'PD4-41', 'PD4-42', 'PD4-43', 'PD4-44']

见Python演示

每个组的捕获堆栈可以通过match.captures(X)访问。

您所拥有的正则表达式可以按原样使用，但是这里有一个命名的捕获组，它更适合用户。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58880341

复制

相似问题

问匹配字符串中的所有标识符。
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问匹配字符串中的所有标识符。EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问匹配字符串中的所有标识符。
EN