我正在从DataFrame中移除口音和特殊字符,但是我的做法对我来说并不是最理想的,我该如何改进呢?
谢谢。
代码:
import pandas as pd
m = pd.read_excel('file.xlsx')
print(m)
m['hola']=m['hola'].str.replace(r"\W","")
m['hola']=m['hola'].str.replace(r"á","a")
m['hola']=m['hola'].str.replace(r"é","e")
m['hola']=m['hola'].str.replace(r"í","i")
m['hola']=m['hola'].str.replace(r"ó","o")
m['hola']=m['hola'].str.replace(r"ú","u")
m['hola']=m['hola'].str.replace(r"Á","A")
m['hola']=m['hola'].str.replace(r"É","E")
m['hola']=m['hola'].str.replace(r"Í","I")
m['hola']=m['hola'].str.replace(r"Ó","O")
m['hola']=m['hola'].str.replace(r"Ú","U")
print(m)发布于 2022-08-08 18:43:52
您可以使用特殊字符作为键,将它们替换为值,创建一个字典:
d = {}
d["á"] = "a".... etc.
x = "árwwwe"
for character in x:
if character in d.keys():
x = x.replace(character, d[character])
print(x)输出:
arwwwehttps://stackoverflow.com/questions/73282596
复制相似问题