我想问一个关于在熊猫中合并多索引数据的问题,下面是一个假设的场景:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])那你也可以
s1.merge(s2, how='left', left_index=True, right_index=True)或
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])会导致错误。
我是否必须在reset_index() / s1/s2上完成这项工作?
发布于 2018-10-12 19:01:13
看来你需要使用它们的组合。
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])输出:
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596发布于 2018-10-12 19:06:13
除了使用@ALollz指出的索引名称之外,您可以简单地使用loc,它将自动匹配索引
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691一般的公式是
s1.loc[:, s2.columns] = s2发布于 2018-10-12 19:24:28
按combine_first分配
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)https://stackoverflow.com/questions/52785579
复制相似问题