Python 通过索引和重复索引合并（或合并）两个数据帧_Python_Pandas_Dataframe

Python 通过索引和重复索引合并（或合并）两个数据帧

python pandas dataframe

Python 通过索引和重复索引合并（或合并）两个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个数据帧A和B，A和B有共同的索引。这些共同的索引可能会出现多次（重复）我想用以下三种方式合并A和B：案例0：如果A的索引i一次出现（i1），而B的索引i 出现一次（i1），我想添加我的按索引合并的数据帧行A（i1），B（i1）案例1：如果A的索引i一次出现（i1），而B的索引i 按此顺序出现两次：（i1和i2），我希望我的索引数据帧以添加行A（i1）、B（i1）和A（i1）、B（i2）案例2：如果A的索引i按以下顺序出现两次：（i1，i2）和 B的索引i按以下顺序出现两次：（

我有两个数据帧A和B，A和B有共同的索引。这些共同的索引可能会出现多次（重复）

我想用以下三种方式合并A和B：

案例0：如果

的索引

一次出现

（i1）

，而

B的索引i
出现一次（i1），我想添加我的按索引合并的数据帧
行A（i1），B（i1）


案例1：如果A
的索引i
一次出现（i1）
，而B的索引i
按此顺序出现两次：（i1和i2），我希望我的
索引数据帧以添加行A（i1）、B（i1）和A（i1）、B（i2）

案例2：如果A
的索引i
按以下顺序出现两次：（i1，i2）和
B
的索引i
按以下顺序出现两次：（i1和i2），我想要
我的“按索引合并”数据框添加行A（i1）、B（i1）和A（i2），
B（i2）
这3个案例都是我的数据中可能出现的案例
使用pandas.merge时，情况0和情况1都有效。但对于情况2，返回的数据帧将添加行A（i1）、B（i1）和A（i1）、B（i2）和A（i2）、B（i1）和A（i2）、B（i2）
，而不是A（i1）、B（i1）和A（i2）、B（i2）

我可以使用pandas.merge方法，然后删除不需要的合并行，但是有没有办法同时合并这3种情况
A = pd.DataFrame([[1, 2], [4, 2], [5,5], [5,5], [1,1]], index=['a', 'a', 'b', 'c', 'c'])
B = pd.DataFrame([[1, 5], [4, 8], [7,7], [5,5]], index=['b', 'c', 'a', 'a'])
pd.merge(A,B, left_index=True, right_index=True, how='inner')

例如，在上面的数据框中，我希望它没有第二个和第三个索引'a'
基本上，您的3个案例可以总结为2个案例：
索引i在A和B中出现相同的次数（1或2次），按照顺序合并
索引i在A中出现2次，在B中出现1次，对所有行使用B内容合并
准备代码：
def add_secondary_index(df):
    df.index.name = 'Old'
    df['Order'] = df.groupby(df.index).cumcount()
    df.set_index('Order', append=True, inplace=True)
    return df
import pandas as pd
A = pd.DataFrame([[1, 2], [4, 2], [5,5], [5,5], [1,1]], index=['a', 'a', 'b', 'c', 'c'])
B = pd.DataFrame([[1, 5], [4, 8], [7,7], [5,5]], index=['b', 'c', 'a', 'a'])
index_times = A.groupby(A.index).count() == B.groupby(B.index).count()

对于案例1很容易解决，只需添加二级索引：
same_times_index = index_times[index_times[0].values].index
A_same = A.loc[same_times_index].copy()
B_same = B.loc[same_times_index].copy()
add_secondary_index(A_same)
add_secondary_index(B_same)
result_merge_same = pd.merge(A_same,B_same,left_index=True,right_index=True)

对于案例2，您需要单独考虑：
not_same_times_index = index_times[~index_times.index.isin(same_times_index)].index
A_notsame = A.loc[not_same_times_index].copy()
B_notsame = B.loc[not_same_times_index].copy()
result_merge_notsame = pd.merge(A_notsame,B_notsame,left_index=True,right_index=True)

您可以考虑是否为代码添加二次索引>结果> MyGeGoEngEng/<代码>，或者将其丢弃为<代码> ReultMyGeGueMease< /C>> 
 < P>基本上，您的3种情况可以归纳为2种情况：
索引i在A和B中出现相同的次数（1或2次），按照顺序合并
索引i在A中出现2次，在B中出现1次，对所有行使用B内容合并
准备代码：
def add_secondary_index(df):
    df.index.name = 'Old'
    df['Order'] = df.groupby(df.index).cumcount()
    df.set_index('Order', append=True, inplace=True)
    return df
import pandas as pd
A = pd.DataFrame([[1, 2], [4, 2], [5,5], [5,5], [1,1]], index=['a', 'a', 'b', 'c', 'c'])
B = pd.DataFrame([[1, 5], [4, 8], [7,7], [5,5]], index=['b', 'c', 'a', 'a'])
index_times = A.groupby(A.index).count() == B.groupby(B.index).count()

对于案例1很容易解决，只需添加二级索引：
same_times_index = index_times[index_times[0].values].index
A_same = A.loc[same_times_index].copy()
B_same = B.loc[same_times_index].copy()
add_secondary_index(A_same)
add_secondary_index(B_same)
result_merge_same = pd.merge(A_same,B_same,left_index=True,right_index=True)

对于案例2，您需要单独考虑：
not_same_times_index = index_times[~index_times.index.isin(same_times_index)].index
A_notsame = A.loc[not_same_times_index].copy()
B_notsame = B.loc[not_same_times_index].copy()
result_merge_notsame = pd.merge(A_notsame,B_notsame,left_index=True,right_index=True)

您可以考虑是否为代码添加二次索引？结果MulgGeEngEngEng/<代码>，或者将其丢弃为<代码> ReultMyGeGeLe>