我四处看了很长时间,找不到满足我需求的正则表达式
我有多行文字,如下所示:
male positive average
average negative female
good negative female
female bad
male average
male
female
...
...在上面的例子中,有三组单词(男性,女性),(好的,平均的,坏的)和(阳性的,阴性的)
我想在一个命名组中捕获每一组单词:性别、质量和反馈。
我最近接触到的是:
(?=.*(?P<gender>\b(fe)?male\b))(?=.*(?P<quality>(green|amber|red)))(?=.*(?P<feedback>(positive|negative))).*这与群体相匹配:性别、质量和反馈按任何顺序排列。
,但它不匹配或/并为下列类型的句子创建一个命名组:
female green
positive male
positive female
female bad
male average
male
female注:性别(男、女)很常见,每一行都有。另外,为了简单起见,这里只提到了三个不同的组。基于需求,它甚至可以增长更多。
任何帮助都将不胜感激。
发布于 2022-02-22 03:06:14
您需要将正则表达式锚定在行的开头(^),并使每个包含命名捕获组的正面外观都是可选的。
另外,您有一些编号的捕获组,它们可能是非捕获组,这不会让人感到困惑,因为您只对命名的捕获组感兴趣。最后,你错过了一些词的界限.
我建议您将表达式更改为以下内容。
^(?=.*(?P<gender>\b(?:fe)?male\b))?(?=.*(?P<quality>\b(?:green|amber|red)\b))?(?=.*(?P<feedback>\b(?:positive|negative)\b))?.*正则表达式可以细分如下。
^ # match beginning of line
(?= # begin positive lookahead
.* # match zero or more characters
(?P<gender> # begin named capture group 'gender'
\b # match a word boundary
(?:female|male) # one of the two words
\b # match a word boundary
) # end capture group 'gender'
)? # end positive lookahead and make it optional(?= # begin positive lookahead
.* # match zero or more characters
(?P<quality> # begin named capture group 'quality'
\b # match a word boundary
(?:green|amber|red) # match one of the three words
\b # match a word boundary
) # end named capture group 'quality'
)? # end positive lookahead and make it optional(?= # begin positive lookahead
.* # match zero or more characters
(?P<feedback> # begin named capture group 'feedback'
\b # match a word boundary
(?:positive|negative) # match one of the two words
\b # match a word boundary
) # end named capture group 'feedback'
)? # end positive lookahead and make it
.* # match zero or more characters (the line)https://stackoverflow.com/questions/71214482
复制相似问题