我有两个时间序列,采样频率相同,但结束日期不同。我想把它们组合成一个,保持总时间范围而不是交集。将数据保留在交集NaN之外。
我试过:
df_to_merge= [df1, df2]
df_merged = reduce(lambda left,right: pd.merge(left,right, on='timestamp'), df_to_merge)数据:
df1
timestamp col1
2010-10-10 00:00 10
2010-10-10 00:01 15
...
2010-10-15 00:00 10
df2
timestamp col2
2010-10-07 00:00 20
2010-10-10 00:01 25
...
2010-10-18 00:00 20预期结果:
timestamp col1 col2
2010-10-07 00:00 NaN 20
2010-10-07 00:01 NaN 25
...
2010-10-10 00:01 10 30
2010-10-15 00:00 10 40
..
2010-10-18 00:00 NaN 20发布于 2022-11-13 22:04:31
您可以执行连接操作:
df_merged = df1.join(df2,how='right')通过使用right,您可以确保右边的所有值(较长的df)都将被保留。
例如:
df1 = pd.DataFrame({'timestamp':pd.to_datetime(pd.Series(['2020-10-10 23:32',
'2020-10-13 23:28'])),
'col1':[5,8]})
df1 = df1.set_index('timestamp').resample('1d').fillna(method='ffill')
col1
timestamp
2020-10-10 NaN
2020-10-11 5.0
2020-10-12 5.0
2020-10-13 5.0和
df2 = pd.DataFrame({'timestamp':pd.to_datetime(pd.Series(['2020-10-08 23:32',
'2020-10-15 23:28'])),
'col2':[50,80]})
df2 = df2.set_index('timestamp').resample('1d').fillna(method='ffill')
col1
timestamp
2020-10-08 NaN
2020-10-09 50.0
2020-10-10 50.0
2020-10-11 50.0
2020-10-12 50.0
2020-10-13 50.0
2020-10-14 50.0
2020-10-15 50.0返回:
col1 col2
timestamp
2020-10-08 NaN NaN
2020-10-09 NaN 50.0
2020-10-10 NaN 50.0
2020-10-11 5.0 50.0
2020-10-12 5.0 50.0
2020-10-13 5.0 50.0
2020-10-14 NaN 50.0
2020-10-15 NaN 50.0https://stackoverflow.com/questions/74425097
复制相似问题