Pandas 合并2个具有优先级的数据帧(如果特定列匹配,则A优先于B)
我有一个数据帧a:Pandas 合并2个具有优先级的数据帧(如果特定列匹配,则A优先于B),pandas,merge,Pandas,Merge,我有一个数据帧a: orderid | productnumber | productcount | productname | productsize | deliverydate | source 1 | 111 | 11 | "big fridge" | 100x200x300 | 2020-11-01 | "A" 1 | 222 | 22
orderid | productnumber | productcount | productname | productsize | deliverydate | source
1 | 111 | 11 | "big fridge" | 100x200x300 | 2020-11-01 | "A"
1 | 222 | 22 | "big fridge" | 100x200x300 | 2020-11-11 | "A"
1 | 333 | 33 | "small fridge" | 100x200x300 | 2020-11-12 | "A"
和数据帧B:
orderid | productnumber | productcount | productname | productsize | deliverydate | transport | source
1 | 111 | 13 | "big fridge" | 100x200x300 | 2020-11-03 | "ship" | "B"
1 | 222 | 22 | "big fridge" | 100x200x300 | 2020-11-11 | "ship" | "B"
A较旧,B较适合特定列,包含更多信息/列,但可能不包含最新内容/行
因此,如果“orderid”+“productnumber”匹配,则B的优先级高于A,因此在合并两者时应替换A中的行
最终结果应该是:
orderid | productnumber | productcount | productname | productsize | deliverydate | transport | source
1 | 111 | 13 | "big fridge" | 100x200x300 | 2020-11-03 | "ship" | "B"
1 | 222 | 22 | "big fridge" | 100x200x300 | 2020-11-11 | "ship" | "B"
1 | 333 | 33 | "small fridge" | 100x200x300 | 2020-11-12 | | "A"
如何使用pandas轻松完成此操作?数据帧,然后执行。根据A和B的优先级,将keep
值更改为“last”和“first”。根据您的pandas版本,您可能需要使用而不是使用ignore_index
param
merged_df = pd.concat([dfA, dfB], ignore_index=True, sort=False)
# merged_df.sort_values(["source"], inplace=True)
merged_df.drop_duplicates(["orderid", "productnumber"], keep="last", inplace=True, ignore_index=True)