Python 数据帧多索引合并
我想问一个关于在pandas中合并多索引数据帧的问题,下面是一个假设场景:Python 数据帧多索引合并,python,pandas,merge,multi-index,Python,Pandas,Merge,Multi Index,我想问一个关于在pandas中合并多索引数据帧的问题,下面是一个假设场景: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index1 = pd.MultiIndex.from_tuples(tuples, nam
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])
那么
s1.merge(s2, how='left', left_index=True, right_index=True)
或
将导致错误
我是否必须在s1/s2中的任何一个上重置_index(),才能使其正常工作
谢谢看起来您需要将它们结合使用
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
输出:
除了使用@ALollz所指出的索引名称外,您只需使用
loc
,它将自动匹配索引
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
一般的公式是
s1.loc[:, s2.columns] = s2
rename_轴
您可以重命名一的索引级别,并让join
执行它的操作
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
concat
首先通过
combine\u分配它
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)
这是许多新用户/编码人员感到沮丧的事情之一,有太多不同的方法来做同样的事情。我喜欢这一点,因为根据数据集或您为什么要首先这样做,您可以选择易于编码和理解的路线,也可以优化以获得更快的运行时间路线。
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
pd.concat([s1, s2], axis=1)
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)