首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >通过从邻居中随机选择一个邮政编码来填充缺失的邮政编码

通过从邻居中随机选择一个邮政编码来填充缺失的邮政编码
EN

Stack Overflow用户
提问于 2019-12-06 13:40:36
回答 1查看 60关注 0票数 1

我有一个像下面这样的pandas数据框,我正在尝试通过从类似的neighbourhood_group_cleansed中选择任何随机值来替换ZipCode域中缺少的值。下面是我的尝试,但效果不是很好。请帮帮忙。

代码语言:javascript
复制
zipcodes = a_df[['neighbourhood_group_cleansed','zipcode']].drop_duplicates().reset_index()
a_df['zipcode'] = a_df.apply(lambda row: np.random.choice(zipcodes[zipcodes['neighbourhood_group_cleansed'] == 
                row['neighbourhood_group_cleansed']]['zipcode']) if len(row.zipcode) == 0   else row.zipcode, axis = 1)

state   city    smart_location  neighbourhood_group_cleansed    zipcode
0   NY  New York    New York, NY    Manhattan   10029
1   NY  Brooklyn    Brooklyn, NY    Brooklyn    11221
2   NY  Brooklyn    Brooklyn, NY    Brooklyn    11206
3   NY  New York    New York, NY    Manhattan   10001
4   NY  New York    New York, NY    Manhattan   10162
... ... ... ... ... ...
6492    NY  New York    New York, NY    Manhattan   10004.0
6493    NY  Brooklyn    Brooklyn, NY    Brooklyn    11229.0
6494    NY  Queens  Queens, NY  Queens  11691.0
6495    NY  New York    New York, NY    Manhattan   10044.0
6496    NY  Brooklyn    Brooklyn, NY    Brooklyn    11234.0
EN

回答 1

Stack Overflow用户

发布于 2019-12-06 14:00:23

这应该是可行的

代码语言:javascript
复制
df['zipcode'] = df.apply(lambda x: random.choice(df[df['neighbourhood_group_cleansed'] == x['neighbourhood_group_cleansed']].zipcode.dropna().values) if np.isnan(x['zipcode']) else x['zipcode'], axis=1)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59207505

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档