我有一只熊猫的数据
df
id Description
1 2694 A&W #5530 MONTREAL QC
2 ahi DOLLARAMA # 45 MONTREAL QC
3 PC - PAYMENT FROM - *****11*22我想格式化这个数据格式,这样df["Description"]列就不会有#、-、*或numbers,比如:
id Description
1 A&W MONTREAL QC
2 ahi DOLLARAMA MONTREAL QC
3 PC PAYMENT FROM我尝试使用python模块re。但我搞错了。
谢谢
发布于 2018-05-13 19:27:43
尝试使用如下正则表达式:
df.Description = df.Description.str.replace(r'[\d#\-\*]', '')这给了我们
0 A&W MONTREAL QC
1 ahi DOLLARAMA MONTREAL QC
2 PC PAYMENT FROM
Name: foo, dtype: object发布于 2018-05-13 19:37:56
您可以使用熊猫.apply和re.sub删除[^A-Z ]+,即:
import pandas as pd
import re
test = ['2694 A&W #5530 MONTREAL QC', 'ahi DOLLARAMA # 45 MONTREAL QC', 'PC - PAYMENT FROM - *****11*22']
def change_me(content):
content = re.sub(r"[^A-Z ]+", "", content, 0, re.IGNORECASE)
return re.sub(r"[ ]{2,}", " ", content, 0, re.IGNORECASE)
df = pd.DataFrame({'Desc':test})
df.Desc = df.Desc.apply(change_me) Desc
0 AW MONTREAL QC
1 ahi DOLLARAMA MONTREAL QC
2 PC PAYMENT FROMPS:
请阅读@ami的评论,.str.replace()是适合这类任务的函数。
https://stackoverflow.com/questions/50319824
复制相似问题