Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas:如果groupby之后的其他列上存在重复项,则根据特定列上给定的权重保留特定行的属性_Python_Python 3.x_Pandas_Python 2.7_Dataframe - Fatal编程技术网

Python Pandas:如果groupby之后的其他列上存在重复项,则根据特定列上给定的权重保留特定行的属性

Python Pandas:如果groupby之后的其他列上存在重复项,则根据特定列上给定的权重保留特定行的属性,python,python-3.x,pandas,python-2.7,dataframe,Python,Python 3.x,Pandas,Python 2.7,Dataframe,我有一个数据帧df df = pd.DataFrame([["A","X",98,56,61], ["B","E",79,54,36], ["A","Y",98,56,61],["B","F",79,54,36], ["A","Z",98,56,61], ["A","W

我有一个数据帧df

df = pd.DataFrame([["A","X",98,56,61], ["B","E",79,54,36], ["A","Y",98,56,61],["B","F",79,54,36], ["A","Z",98,56,61], ["A","W",48,51,85],["B","G",44,57,86],["B","H",79,54,36]], columns=["id","class","c1","c2","c3"])
当我们在id上执行groupby时,如果基于多个列(如c1、c2、c3)存在重复值(行),则根据列class上给出的权重保留行

例如,在这里,当我们对idA进行分组时,c1、c2、c3是类X、Y、Z的重复项,其中X、Y、Z权重为X,因此保留X并删除其他行,类似地,在E、F、H中为F赋予权重,因此,保留F并删除其他行

预期输出:

output = pd.DataFrame([["A","X",98,56,61],["B","F",79,54,36],["A","W",48,51,85],["B","G",44,57,86]], columns=["id","class","c1","c2","c3"])

如何操作?

根据您的解释,您可以创建一个权重字典,然后创建两个条件,然后执行以下操作:

#add classes for weightage incase of duplicates
cls = ['X','F']
c = df.duplicated(['id','c1','c2','c3'],keep=False) 
out = df[(c&df['class'].isin(cls))|~c]


使用
df=df。删除重复项(['id','c1','c2','c3'])
不清楚为什么要保留
F
而不是
E
作为id
B
,例如,是否有任何特定的权重?你能澄清一下吗?是的,有具体的权重,如果X,Y,Z是重复的,保留X行,如果E,F,H是重复的,那么保留F行。在这种情况下,我不认为这是一个重复的问题。重新开放。但是你应该试着解释一下这个问题,因为readno weightage仅仅基于类列有点让人困惑,如果x,y,z选择了x行,如果存在重复的行,如果e,f,h选择了f行,那么id duplicate存在,它完全在类列上,比如如果存在重复的行,那么x,y,z选择了x行,如果重复的行存在于e,f,h中,那么选择f行this@Chethan更新答案以适应该要求。
print(out)

  id class  c1  c2  c3
0  A     X  98  56  61
3  B     F  79  54  36
5  A     W  48  51  85
6  B     G  44  57  86