我有一个示例数据帧文本列,其中包含字符串,包括单词'eng‘和单词'engine’。
ID Text
1 eng is here
2 engine needs washing
3 eng is overheating 我想把'eng‘改为'engine’。我使用以下代码:
df['Text'] = df['Text'].str.replace('eng', 'engine')但这把我第二排的短信搞砸了。第二行变成
ID Text
2 engineine needs washing是否有一种方法来做单词替换,使它只替换时,整个词说‘英语’?
发布于 2019-01-02 15:53:56
从您自己的代码中添加一个空白并修复该问题
df['Text'].str.replace('eng ', 'engine ')
Out[736]:
0 engine is here
1 engine needs washing
2 engine is overheating
Name: Text, dtype: object更新
df.Text.str.split(' ',expand=True).replace('eng','engine').fillna('').apply(' '.join,1)
Out[752]:
0 engine is here
1 engine needs washing
2 engine is overheating
dtype: object发布于 2019-01-02 15:50:35
用单词边界字符\b包装关键字
df['Text'].str.replace(r'\beng\b', 'engine')
0 engine is here
1 engine needs washing
2 engine is overheating
Name: Text, dtype: object如果要以这种方式替换多个关键字,请使用replace开关将字典传递给regex=True:
repl = {'eng' : 'engine'}
repl = {rf'\b{k}\b': v for k, v in repl.items()}
df['Text'].replace(repl, regex=True)
0 engine is here
1 engine needs washing
2 engine is overheating
Name: Text, dtype: object发布于 2019-01-02 15:51:04
您可以尝试这样的正则表达式:
import re
df['Text'] = df['Text'].map(lambda x: re.sub(r'\beng\b', 'engine', x))这个给定的正则表达式中的\b标记与“forced边界”匹配,因此'eng‘将被迫被空格包围。
https://stackoverflow.com/questions/54009249
复制相似问题