首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python regex会立即将组替换为组名

Python regex会立即将组替换为组名
EN

Stack Overflow用户
提问于 2016-04-30 01:44:51
回答 2查看 518关注 0票数 1

以下是re:

代码语言:javascript
复制
import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'\1',s)

结果是,

代码语言:javascript
复制
'the dog and cat wore 7 blue hats 9 days ago'

有没有可能编写这样一个re.sub:

代码语言:javascript
复制
import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')

结果是,

代码语言:javascript
复制
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago"

奇怪的是,有很多关于replace stringsgetting group names的文档,但没有一种很好的方法来做这两件事。

EN

回答 2

Stack Overflow用户

发布于 2016-04-30 02:01:38

您可以使用返回matchobj.lastgroupre.sub with a callback

代码语言:javascript
复制
import re

s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')

def callback(matchobj):
    return matchobj.lastgroup

result = p.sub(callback, s)
print(result)

收益率

代码语言:javascript
复制
the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago

请注意,如果您使用的是Pandas,则可以使用Series.str.replace

代码语言:javascript
复制
import pandas as pd

def callback(matchobj):
    return matchobj.lastgroup

df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9", 
                          "days ago"]})
pat = r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])'
df['result'] = df['foo'].str.replace(pat, callback)
print(df)

收益率

代码语言:javascript
复制
                        foo                                 result
0              the blue dog                             the animal
1  and blue cat wore 7 blue  and animal wore numberBelowSeven blue
2                    hats 9                    hats numberNotSeven
3                  days ago                               days ago

如果您有嵌套的命名组,则可能需要一个更复杂的回调,该回调遍历matchobj.groupdict().items()以收集所有相关的组名:

代码语言:javascript
复制
import pandas as pd

def callback(matchobj):
    names = [groupname for groupname, matchstr in matchobj.groupdict().items()
             if matchstr is not None]
    names = sorted(names, key=lambda name: matchobj.span(name))
    result = ' '.join(names)
    return result

df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9", 
                          "days ago"]})

pat=r'blue (?P<animal>dog|cat)|(?P<numberItem>(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'

# pat=r'(?P<someItem>blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'

df['result'] = df['foo'].str.replace(pat, callback)
print(df)

收益率

代码语言:javascript
复制
                        foo                                            result
0              the blue dog                                        the animal
1  and blue cat wore 7 blue  and animal wore numberItem numberBelowSeven blue
2                    hats 9                    hats numberItem numberNotSeven
3                  days ago                                          days ago
票数 1
EN

Stack Overflow用户

发布于 2016-04-30 01:59:49

为什么不多次调用re.sub()

代码语言:javascript
复制
>>> s = re.sub(r"blue (dog|cat)", "animal", s)
>>> s = re.sub(r"\b[0-7]\b", "numberBelowSeven", s)
>>> s = re.sub(r"\b[8-9]\b", "numberNotSeven", s)
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'

然后,您可以将其放入“更改列表”中,并逐一应用:

代码语言:javascript
复制
>>> changes = [
...     (re.compile(r"blue (dog|cat)"), "animal"),
...     (re.compile(r"\b[0-7]\b"), "numberBelowSeven"),
...     (re.compile(r"\b[8-9]\b"), "numberNotSeven")
... ]
>>> s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
>>> for pattern, replacement in changes:
...     s = pattern.sub(replacement, s)
... 
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'

请注意,我另外添加了边界检查(\b)一词。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/36944513

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档