以下是re:
import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'\1',s)结果是,
'the dog and cat wore 7 blue hats 9 days ago'有没有可能编写这样一个re.sub:
import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')结果是,
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago"奇怪的是,有很多关于replace strings和getting group names的文档,但没有一种很好的方法来做这两件事。
发布于 2016-04-30 02:01:38
您可以使用返回matchobj.lastgroup的re.sub with a callback
import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')
def callback(matchobj):
return matchobj.lastgroup
result = p.sub(callback, s)
print(result)收益率
the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago请注意,如果您使用的是Pandas,则可以使用Series.str.replace
import pandas as pd
def callback(matchobj):
return matchobj.lastgroup
df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9",
"days ago"]})
pat = r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])'
df['result'] = df['foo'].str.replace(pat, callback)
print(df)收益率
foo result
0 the blue dog the animal
1 and blue cat wore 7 blue and animal wore numberBelowSeven blue
2 hats 9 hats numberNotSeven
3 days ago days ago如果您有嵌套的命名组,则可能需要一个更复杂的回调,该回调遍历matchobj.groupdict().items()以收集所有相关的组名:
import pandas as pd
def callback(matchobj):
names = [groupname for groupname, matchstr in matchobj.groupdict().items()
if matchstr is not None]
names = sorted(names, key=lambda name: matchobj.span(name))
result = ' '.join(names)
return result
df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9",
"days ago"]})
pat=r'blue (?P<animal>dog|cat)|(?P<numberItem>(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'
# pat=r'(?P<someItem>blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'
df['result'] = df['foo'].str.replace(pat, callback)
print(df)收益率
foo result
0 the blue dog the animal
1 and blue cat wore 7 blue and animal wore numberItem numberBelowSeven blue
2 hats 9 hats numberItem numberNotSeven
3 days ago days ago发布于 2016-04-30 01:59:49
为什么不多次调用re.sub():
>>> s = re.sub(r"blue (dog|cat)", "animal", s)
>>> s = re.sub(r"\b[0-7]\b", "numberBelowSeven", s)
>>> s = re.sub(r"\b[8-9]\b", "numberNotSeven", s)
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'然后,您可以将其放入“更改列表”中,并逐一应用:
>>> changes = [
... (re.compile(r"blue (dog|cat)"), "animal"),
... (re.compile(r"\b[0-7]\b"), "numberBelowSeven"),
... (re.compile(r"\b[8-9]\b"), "numberNotSeven")
... ]
>>> s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
>>> for pattern, replacement in changes:
... s = pattern.sub(replacement, s)
...
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'请注意,我另外添加了边界检查(\b)一词。
https://stackoverflow.com/questions/36944513
复制相似问题