我有两个数据格式-- df1和df2,如下所示:
df_1 = pd.DataFrame( {'num': [1,2,3], 'time': [100,200,300]})
df_2 = pd.DataFrame( {'num': [1,2,3], 'time': [101,104,200]})Match =是df1中的'num‘在df2中不等于'num’,而df1中的时间在偏移量为10的df2中时。结果需要用df_3表示:
如果我们找不到几个火柴,我们就只拿第一个。
我的代码是有效的,但效率不高。
我的代码:
df_3 = pd.DataFrame(columns = ['num_df1', 'num_df2', 'time_df1', 'time_df2'])
def find_match(row_df_1, df_2):
time = row_df_1['time']
match_in_df2 = df_2[(df_2['num'] != row_df_1['num']) & (df_2['time'].between(time-10, time+10))]
if len(match_in_df2) > 0:
match_in_df2.reset_index(inplace=True, drop=True)
return pd.Series([row_df_1['num'], match_in_df2.loc[0,'num'],row_df_1['time'], match_in_df2.loc[0,'time']], index=df_3.columns)
for index, row_df_1 in df_1.iterrows():
df_3 = df_3.append(find_match(row_df_1, df_2), ignore_index=True)
results:
num_df1 num_df2 time_df1 time_df2
0 1 2 100 104
1 2 3 200 200发布于 2021-11-03 07:24:31
在使用布尔掩码选择正确的行之前,使用merge和how='cross':
out = pd.merge(df_1, df_2, how='cross', suffixes=('_df1', '_df2'))
m1 = out['num_df1'] != out['num_df2']
m2 = abs(out['time_df2'] - out['time_df1']) <= 10
out = out[m1 & m2]输出:
>>> out
num_df1 time_df1 num_df2 time_df2
1 1 100 2 104
5 2 200 3 200https://stackoverflow.com/questions/69820879
复制相似问题