Python 将两个数据帧与第一条记录合并,但不包含重复行
我有一个Python 将两个数据帧与第一条记录合并,但不包含重复行,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我有一个dataframe包含数百万条记录,如: ID input Output Price Category 1 19 10 50 A 2 20 57 70 A 3 30 58 55 A 4 40 19 40 B 5 19 17 10 A 6 40 20 70 B 7 11
dataframe
包含数百万条记录,如:
ID input Output Price Category
1 19 10 50 A
2 20 57 70 A
3 30 58 55 A
4 40 19 40 B
5 19 17 10 A
6 40 20 70 B
7 11 19 10 B
8 10 20 60 B
我想要一个新的输出,比如:
IDA inputA OutputA PriceA CategoryA IDB inputB OutputB PriceB CategoryB
1 19 10 40 A 4 40 19 40 B
1' 19 10 10 A 7 11 19 10 B
2 20 57 20 A 6 40 20 70 B
2' 20 57 50 A 8 10 20 50 B
Nan Nan Nan Nan Nan 8' 10 20 10 B
3 30 58 55 A Nan Nan Nan Nan Nan
5 19 17 10 A Nan Nan Nan Nan Nan
我需要迭代A类和B类的记录
如果InputA==OutputB,则将B行与类别为A和的第一条记录合并
检查价格A
-if price A= price B : merge row A and row B
-if price A> price B : duplicate row A , and priceA = priceA-priceB
-if price A< price B : duplicate row B , and priceB = priceB-priceA
问题是rowB与所有rowA InputA==OutputB合并,而不仅仅是第一条记录
category = df.groupby('category')
transform_df = []
for index, frame in category:
frame.reset_index(drop=True, inplace=True)
transform_df.append(frame.copy())
A = pd.DataFrame([transformed_df_list[0])
B= pd.DataFrame([transformed_df_list[1])
for i , row in A.iterrows():
for i, row1 in B.iterrows():
if row['input'] == row1['output']:
if row['price'] == row1['price']:
row_df = pd.DataFrame([row1])
output = pd.merge(A ,B, how='left' , left_on =['input'] , right_on =['output'] )