我们会感谢你的帮助。
我有2 DataFrames。
第一个数据框架由person的活动计划schedule组成,如下所示:
PersonID Person Origin Destination
3-1 1 A B
3-1 1 B A
13-1 1 C D
13-1 1 D C
13-2 2 A B
13-2 2 B A我还有另一个DataFrame,household,包含个人/代理的详细信息。
PersonID1 Age1 Gender1 PersonID2 Age2 Gender2
3-1 20 M NaN NaN NaN
13-1 45 F 13-2 17 M我想使用pd.merge对这两个人执行一个VLOOKUP。因为查找(合并)将取决于个人的ID,所以我尝试使用一个条件。
def merging(row):
if row['Person'] == 1:
row = pd.merge(row, household, how='left', left_on=['PersonID'], right_on=['Age1', 'Gender1'])
else:
row = pd.merge(row, household, how='left', left_on=['PersonID'], right_on=['Age2','Gender2'])
return row
schedule_merged = schedule.apply(merging, axis=1)然而,由于某种原因,它就是不起作用。错误说是ValueError: len(right_on) must equal len(left_on)。我的目标是在最后做出这样的数据:
PersonID Person Origin Destination Age Gender
3-1 1 A B 20 M
3-1 1 B A 20 M
13-1 1 C D 45 F
13-1 1 D C 45 F
13-2 2 A B 17 M
13-2 2 B A 17 M我想我搞砸了pd.merge的线路。虽然在Excel中使用VLOOKUP可能更有效,但它对我的PC来说太重了,因为我必须将它应用于10万个数据。我怎么能做好这件事?谢谢!
发布于 2021-07-14 15:27:25
如果实际的数据集不比给定的示例复杂,我将这样做。另外,我建议您看看pd.melt()是否有更复杂的无轴操作。
import pandas as pd
import numpy as np
# Create Dummy schedule DataFrame
d = {'PersonID': ['3-1', '3-1', '13-1', '13-1', '13-2', '13-2'], 'Person': ['1', '1', '1', '1', '2', '2'], 'Origin': ['A', 'B', 'C', 'D', 'A', 'B'], 'Destination': ['B', 'A', 'D', 'C', 'B', 'A']}
schedule = pd.DataFrame(data=d)
schedule
# Create Dummy houshold DataFrame
d = {'PersonID1': ['3-1', '13-1'], 'Age1': ['20', '45'], 'Gender1': ['M', 'F'], 'PersonID2': [np.nan, '13-2'], 'Age2': [np.nan, '17'], 'Gender2': [np.nan, 'M']}
household = pd.DataFrame(data=d)
household
# Select columns for PersonID1 and rename columns
household1 = household[['PersonID1', 'Age1', 'Gender1']]
household1.columns = ['PersonID', 'Age', 'Gender']
# Select columns for PersonID1 and rename columns
household2 = household[['PersonID2', 'Age2', 'Gender2']]
household2.columns = ['PersonID', 'Age', 'Gender']
# Concat them together
household_new = pd.concat([household1, household2])
# Merge houshold and schedule df together on PersonID
schedule = schedule.merge(household_new, how='left', left_on='PersonID', right_on='PersonID', validate='many_to_one')输出
PersonID Person Origin Destination Age Gender
3-1 1 A B 20 M
3-1 1 B A 20 M
13-1 1 C D 45 F
13-1 1 D C 45 F
13-2 2 A B 17 M
13-2 2 B A 17 Mhttps://stackoverflow.com/questions/68380239
复制相似问题