Python 根据列表中给定的优先级保留重复行
我有一个数据帧Python 根据列表中给定的优先级保留重复行,python,python-3.x,pandas,python-2.7,dataframe,Python,Python 3.x,Pandas,Python 2.7,Dataframe,我有一个数据帧 df = pd.DataFrame([["A","Q",98,56],["C","S",18,45], ["B","T",79,54], ["A","P",98,56],["C","R",18,45],["B","S",79,54], [&qu
df = pd.DataFrame([["A","Q",98,56],["C","S",18,45], ["B","T",79,54], ["A","P",98,56],["C","R",18,45],["B","S",79,54], ["A","R",84,65],["B","Q",79,54],["C","Q",19,44]], columns=["id","prio","c1","c2"])
我有一份清单
Priority = ["P","R","Q","S","T"]
根据id、c1、c2选择重复的行。
如果我们发现重复的行,则根据prio列中列表中给定的优先级保留这些行
例如:对于id A的重复行,如果p和Q出现在列prio中,则优先考虑p并删除其他行,类似地,对于id B的重复行,T、S、Q出现在列prio中,因为在T、S中,Q排在列表的第一位。所以保留Q行
预期产出:
df_out = pd.DataFrame([["A","P",98,56],["C","R",18,45], ["A","R",84,65],["B","Q",79,54],["C","Q",19,44]], columns=["id","prio","c1","c2"])
如何操作?您可以将值转换为有序类别,然后与以下内容一起使用:
df['prio'] = pd.Categorical(df['prio'], categories=Priority, ordered=True)
df = df.sort_values('prio').drop_duplicates(['id','c1','c2'])
print (df)
id prio c1 c2
3 A P 98 56
4 C R 18 45
6 A R 84 65
7 B Q 79 54
8 C Q 19 44