Python 将两个数据帧与第一条记录合并，但不包含重复行_Python_Pandas_Dataframe_Merge

Python 将两个数据帧与第一条记录合并，但不包含重复行

python pandas dataframe merge

Python 将两个数据帧与第一条记录合并，但不包含重复行,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我有一个dataframe包含数百万条记录，如： ID input Output Price Category 1 19 10 50 A 2 20 57 70 A 3 30 58 55 A 4 40 19 40 B 5 19 17 10 A 6 40 20 70 B 7 11

我有一个

dataframe

包含数百万条记录，如：

ID input Output  Price   Category
1    19    10    50         A
2    20    57    70         A
3    30    58    55         A
4    40    19    40         B
5    19    17    10         A
6    40    20    70         B
7    11    19    10         B
8    10    20    60         B

我想要一个新的输出，比如：

IDA inputA OutputA  PriceA   CategoryA  IDB inputB OutputB  PriceB   CategoryB 
  1    19    10      40         A        4    40    19      40         B
  1'   19    10      10         A        7    11    19      10         B
  2    20    57      20         A        6    40    20      70         B
  2'   20    57      50         A        8    10    20      50         B
  Nan   Nan   Nan     Nan      Nan       8'   10    20      10         B  
  3    30    58      55         A       Nan   Nan   Nan     Nan        Nan
  5    19    17      10         A       Nan   Nan   Nan     Nan        Nan

我需要迭代A类和B类的记录

如果InputA==OutputB，则将B行与类别为A和的第一条记录合并检查价格A

-if  price A= price B : merge row A and row B
-if  price A> price B : duplicate row A , and priceA = priceA-priceB
-if  price A< price B : duplicate row B , and priceB = priceB-priceA

问题是rowB与所有rowA InputA==OutputB合并，而不仅仅是第一条记录

category = df.groupby('category')

transform_df = []

for index, frame in category:
frame.reset_index(drop=True, inplace=True)
transform_df.append(frame.copy())
A = pd.DataFrame([transformed_df_list[0])
B=  pd.DataFrame([transformed_df_list[1])
for i , row in A.iterrows(): 
    for i, row1 in B.iterrows(): 
        if row['input'] == row1['output']:
           if row['price'] == row1['price']:
            row_df = pd.DataFrame([row1])
            output = pd.merge(A ,B,  how='left' , left_on =['input'] ,  right_on =['output'] )