我不确定我在这里做错了什么,但我想用column_names列表中的相应名称替换初始列名。
column_names = ['FIPS','Admin2','Province_State','Country_Region','Last_Update','Lat','Long_','Confirmed','Deaths','Recovered','Active','Combined_Key']
df.columns = ['Province/State', 'Country/Region', 'Last Update', 'Confirmed',
'Deaths', 'Recovered', 'Latitude', 'Longitude']
def replace_cols(df, new_columns):
k = 0
for i in df.columns:
for j in column_names:
seq = difflib.SequenceMatcher(None,i, j).ratio()*100
if seq >= 50:
newcol = re.sub(i, j, i)
df.columns.values[k] = newcol
print(newcol)
k += 1发布于 2020-05-18 06:04:17
将阈值从50增加到54会起作用:
import re
column_names = ['FIPS','Admin2','Province_State','Country_Region','Last_Update','Lat','Long_','Confirmed','Deaths','Recovered','Active','Combined_Key']
# df.columns = ['Province/State', 'Country/Region', 'Last Update', 'Confirmed','Deaths', 'Recovered', 'Latitude', 'Longitude']
cols = ['Province/State', 'Country/Region', 'Last Update', 'Confirmed','Deaths', 'Recovered', 'Latitude', 'Longitude']
df = pd.DataFrame([], columns=cols)
def replace_columns(df, new_columns):
k = 0
for i in df.columns:
print('Old col', i, k)
for j in column_names:
seq = difflib.SequenceMatcher(None,i, j).ratio()*100
if seq >= 54:
newcol = re.sub(i, j, i)
print('Newcol ', newcol)
df.columns.values[k] = newcol
k += 1
return df发布于 2020-05-18 06:04:46
这是因为Lat/ last _update都改变了值Latitude (相似度超过50%),所以在最后一个循环中,您的代码首先用latitude替换Last_update,然后用latitude替换Lat,然后超出df的长度。
https://stackoverflow.com/questions/61859276
复制相似问题