我有这样一只熊猫:
datadict = {"StationID":[102,103,11],"Bikes_Available":[10,3,2], "Suggested_Movements":[{11:3,103:5}, {11:3}, {103:1}]}
df = pd.DataFrame.from_dict(datadict)
StationID Bikes_Available Suggested_Movements
102 10 {11: 3, 103: 5}
103 3 {11: 3}
11 2 {103: 1}最后一列是字典。键表示自行车应该移动到的StationID,值表示应该移动的自行车数量。我想要创建一个新的列,显示每个StationID应该添加多少辆自行车。我需要查找每本字典的Suggested_Movements和所有的自行车被移动到每个站。输出将如下所示:
StationID Bikes_Available Suggested_Movements To_Add
102 10 {11: 3, 103: 5} 0
103 3 {11: 2} 6
11 2 {103: 1} 5 102站增加了0辆自行车。站103加6,站102加5,站11加1,站11加5,站102加3,站103加2。我尝试过在lambdas中使用apply,但是我无法正确地理解逻辑。
一个问题是不能使用ID列作为“Suggested_Movements”列的字典键,因为它们是可选的。为了避免这种情况,我尝试先将它们转换为字符串,然后再将它们转换回整数,以便将它们识别为字典键,如下所示:
df['to add'] = df['Suggested_Movements'].apply(lambda x: x.get(int(df['StationID'].to_string())))这给了我以下错误:
ValueError: invalid literal for int() with base 10: '0 102\n1 103\n2 11\n3 12\n4 133\n5 134\n6 145\n7 156\n8 157\n9 161\n10 172\n11 2\n12 217\n13 24\n14 259\n15 270\n16这表明在站点索引中添加了额外的字符。我想我可以添加.split('\\')[0]来只得到我需要的数字,但是这也有问题,这似乎是正确的方法。
我也试过这个:
df['To_Add'] = df[StationID].map(df["Suggested_Movements"]).fillna(df['To_add'])
基于另一个问题,在上一篇专栏中给了我一本整本字典。
发布于 2020-04-13 05:43:49
以下是我的建议:
df = pd.DataFrame({"StationID": [102, 103, 11],
"Bikes_available": [10,3, 2],
"Suggested_movement": [{11: 3, 103: 5}, {103: 3, 15: 3}, {17: 1, 11: 1}]})
# create a dataframe from the list of dictionaries and sum the values
to_add = pd.DataFrame(list(df["Suggested_movement"])).sum()然后,concat df和to_add按索引StationID,删除在df中不存在的值,然后用0填充to_add NaN:
df = (pd.concat([df.set_index("StationID"), to_add], axis=1, join="outer")\
.dropna(subset=["Bikes_available", "Suggested_movement"])
.fillna(0))正如@j杯所指出的,其结果是,假设每个站点都可以将自行车发送给自己,就像预期的输出那样(尽管您应该得到8/ 103):
Bikes_available Suggested_movement 0
11 2.0 {17: 1, 11: 1} 4.0
102 10.0 {11: 3, 103: 5} 0.0
103 3.0 {103: 3, 15: 3} 8.0现在,如果一个站点无法将自行车发送给自己,您可以从Suggested_movement中删除相应的键,如下所示:
df = pd.DataFrame({"StationID": [102, 103, 11],
"Bikes_available": [10,3, 2],
"Suggested_movement": [{11: 3, 103: 5}, {103: 3, 15: 3}, {17: 1, 11: 1}]})
df["Suggested_movement"] = df.apply(lambda x: {k:v for k,v in
x["Suggested_movement"].items() if k != x["StationID"]},
axis=1)
to_add = pd.DataFrame(list(df["Suggested_movement"])).sum()
df = (pd.concat([df.set_index("StationID"), to_add], axis=1, join="outer")\
.dropna(subset=["Bikes_available", "Suggested_movement"])
.fillna(0))结果是:
Bikes_available Suggested_movement 0
11 2.0 {17: 1} 3.0
102 10.0 {11: 3, 103: 5} 0.0
103 3.0 {15: 3} 5.0希望它能帮上忙
https://stackoverflow.com/questions/61179153
复制相似问题