我有一个像下面这样的pandas数据框,我正在尝试通过从类似的neighbourhood_group_cleansed中选择任何随机值来替换ZipCode域中缺少的值。下面是我的尝试,但效果不是很好。请帮帮忙。
zipcodes = a_df[['neighbourhood_group_cleansed','zipcode']].drop_duplicates().reset_index()
a_df['zipcode'] = a_df.apply(lambda row: np.random.choice(zipcodes[zipcodes['neighbourhood_group_cleansed'] ==
row['neighbourhood_group_cleansed']]['zipcode']) if len(row.zipcode) == 0 else row.zipcode, axis = 1)
state city smart_location neighbourhood_group_cleansed zipcode
0 NY New York New York, NY Manhattan 10029
1 NY Brooklyn Brooklyn, NY Brooklyn 11221
2 NY Brooklyn Brooklyn, NY Brooklyn 11206
3 NY New York New York, NY Manhattan 10001
4 NY New York New York, NY Manhattan 10162
... ... ... ... ... ...
6492 NY New York New York, NY Manhattan 10004.0
6493 NY Brooklyn Brooklyn, NY Brooklyn 11229.0
6494 NY Queens Queens, NY Queens 11691.0
6495 NY New York New York, NY Manhattan 10044.0
6496 NY Brooklyn Brooklyn, NY Brooklyn 11234.0发布于 2019-12-06 14:00:23
这应该是可行的
df['zipcode'] = df.apply(lambda x: random.choice(df[df['neighbourhood_group_cleansed'] == x['neighbourhood_group_cleansed']].zipcode.dropna().values) if np.isnan(x['zipcode']) else x['zipcode'], axis=1)https://stackoverflow.com/questions/59207505
复制相似问题