我正在自动化一个必要的任务--列person_id,但是我有这两个df。
df_1 = pd.DataFrame({'Code':['100', '101', '102', '103', '104'], 'person_id': ['4000', '4001', '4002', '4003', '4004']})
df_2 = pd.DataFrame ({'Code':['100', '101', '102', '103', '104'],'owner_1':['False', 'True', 'False', 'False','False'], 'owner_2':['True', 'False', 'False', 'False','False']})我需要的是确认什么是真正的所有者--使用列"Code“创建第三列,并使用条件词创建类似于"Owner_n_id”的列,在其中我收到了person_id列,但我对如何编写该列感到有点困惑,我所做的一切都不是很好。
我需要我的新df如下所示
df_3 = pd.DataFrame ({'Code':['100', '101', '102', '103', '104'], 'owner_1_id':['False', '4001', 'False', 'False', 'False'],'owner_2_id':['4000', 'False', 'False', 'False','False']})发布于 2022-03-03 18:59:22
TL;DR
如果这意味着要用于更多的所有者,则可以创建一个循环。
df_3 = df_1.merge(df_2, on = "Code")
NUM_OWNERS = 2
for owner in range(1, NUM_OWNERS+1):
indexing = df_3[f"owner_{owner}"] == "True"
df_3.loc[indexing, f"owner_{owner}"] = df_3.loc[indexing, "person_id"]
# finally drop the extra column
df_3.drop("person_id", axis = 1, inplace = True)我将首先合并1和2,所以第三个更一致:
>>> df_3 = df_1.merge(df_2, on = "Code")
Code person_id owner_1 owner_2
0 100 4000 False True
1 101 4001 True False
2 102 4002 False False
3 103 4003 False False
4 104 4004 False False一旦我们有了这个,我们就想访问owner_1 == "True"的位置
owner_1_is_true = df_3.owner_1 == "True"
# and know we can update the value based in this indexing:
df_3.owner_1[owner_1_is_true] = df_3.person_id[owner_1_is_true]
# repeat for owner 2:
owner_2_is_true = df_3.owner_2 == "True"
df_3.owner_2[owner_2_is_true] = df_3.person_id[owner_2_is_true]
# drop the `person_id` we created on the merge
df_3.drop("person_id", axis = 1, inplace = True)结果:
Code owner_1 owner_2
0 100 False 4000
1 101 4001 False
2 102 False False
3 103 False False
4 104 False False如果这意味着要用于更多的所有者,则可以创建一个循环。
NUM_OWNERS = 2
for owner in range(1, NUM_OWNERS+1):
indexing = df_3[f"owner_{owner}"] == "True"
df_3.loc[indexing, f"owner_{owner}"] = df_3.loc[indexing, "person_id"]
# finally drop the extra column
df_3.drop("person_id", axis = 1, inplace = True)发布于 2022-03-03 18:49:19
此代码直接更改df_2。您可以在继续之前将df_2复制到df_3中。请注意,对False和True使用字符串值与使用布尔运算符相比效率低下。
df_2['owner_1_id'] = 'False'
df_2.loc[df_2['owner_1'] == 'True', ['owner_1_id']] = df_2.merge(df_1)['person_id']
df_2['owner_2_id'] = 'False'
df_2.loc[df_2['owner_2'] == 'True', ['owner_2_id']] = df_2.merge(df_1)['person_id']https://stackoverflow.com/questions/71341907
复制相似问题