假设我有3个df,如下所示:
df = pd.DataFrame({'Week': ['W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7'], 'A': [34, 67, 92, 31, 90, 100, 101]})
df2 = pd.DataFrame({'Week': ['W4', 'W5', 'W6', 'W7', 'W8', 'W9', 'W10'], 'B': [75, np.nan, 53, 21, 94, 47, 88]})
df3 = pd.DataFrame({'Week': ['W12', 'W13', 'W14', 'W15', 'W16', 'W17', 'W18'], 'C': [25, 30, 40, 45, 46, 47, 48]})通常,当我构建股票价格数据库时,我会使用pd.merge,并使用非常有用的on='Week'函数(在本例中)使用Week列将数据合并在一起。所以我的代码应该是这样的:
df = pd.merge(df, df2, on='Week', how='left')
df = pd.merge(df, df3, on='Week', how='left')生成以下df
Week A B C
0 W1 34 NaN NaN
1 W2 67 NaN NaN
2 W3 92 NaN NaN
3 W4 31 75.0 NaN
4 W5 90 NaN NaN
5 W6 100 53.0 NaN
6 W7 101 21.0 NaN这很好,我只想看到W1 - W7的数据,如果没有数据,我只想要NaNs。
我被引导相信使用pd.concat比pd.merge快得多,当我看到数百只股票时,这确实有助于减少构建pd.merge所需的时间。当然,日期应该完全匹配是非常重要的,因此我一直在pd.merge中使用pd.merge函数。
到目前为止,我还无法找到如何使用pd.concat复制这种行为。有人有什么建议吗?到目前为止,我尝试过的事情如下所示:
df = pd.concat([df, df2], sort=True).groupby('Week').mean()但这会导致以下情况,甚至与我想要的不太接近:
A B
Week
W1 34.0 NaN
W10 NaN 88.0
W2 67.0 NaN
W3 92.0 NaN
W4 31.0 75.0
W5 90.0 NaN
W6 100.0 53.0
W7 101.0 21.0
W8 NaN 94.0
W9 NaN 47.0任何帮助都会很感激的,干杯
编辑:
抱歉,我只是想澄清一下,我的预期输出是df的一半,这一个:
Week A B C
0 W1 34 NaN NaN
1 W2 67 NaN NaN
2 W3 92 NaN NaN
3 W4 31 75.0 NaN
4 W5 90 NaN NaN
5 W6 100 53.0 NaN
6 W7 101 21.0 NaN发布于 2019-10-29 15:36:38
你可以这样做:
concated = pd.concat([df, df2, df3], sort=False).groupby('Week').first()
result = concated[concated.index.isin(('W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7'))]
print(result)输出
A B C
Week
W1 34.0 NaN NaN
W2 67.0 NaN NaN
W3 92.0 NaN NaN
W4 31.0 75.0 NaN
W5 90.0 NaN NaN
W6 100.0 53.0 NaN
W7 101.0 21.0 NaN发布于 2019-10-29 15:49:26
我只是连锁合并方法,因为它更简洁,速度差异是不明显的,除非你有大量的数据。
df = df1.merge(df2, how='left').merge(df3, how='left')
print(df)
Week A B C
0 W1 34 NaN NaN
1 W2 67 NaN NaN
2 W3 92 NaN NaN
3 W4 31 75.0 NaN
4 W5 90 NaN NaN
5 W6 100 53.0 NaN
6 W7 101 21.0 NaNhttps://stackoverflow.com/questions/58610561
复制相似问题