Python Pandas-在另一列中具有相等值的行中查找一列中的重复项_Python_Pandas

Python Pandas-在另一列中具有相等值的行中查找一列中的重复项

python pandas

Python Pandas-在另一列中具有相等值的行中查找一列中的重复项,python,pandas,Python,Pandas,假设数据帧df如下所示： col1 col2 0 a A 1 b A 2 c A 3 c B 4 a B 5 b B 6 a C 7 a C 8 c C 我想找到col2的那些值，其中col1中有重复的a。在本例中，结果应该是['C]'，因为对于df['col2']=='C'，col1有两个a作为条目我尝试过这种方法 df[(df['col1'] == 'a') & (df

假设数据帧

df

如下所示：

  col1 col2
0    a    A
1    b    A
2    c    A
3    c    B
4    a    B
5    b    B
6    a    C
7    a    C
8    c    C

我想找到

col2

的那些值，其中

col1

中有重复的

。在本例中，结果应该是

['C]'

，因为对于

df['col2']=='C'

，

col1

有两个

作为条目

我尝试过这种方法

df[(df['col1'] == 'a') & (df['col2'].duplicated())]['col2'].to_list()

但这仅适用于

col2

定义的行块中的

位于块的开头或结尾，具体取决于您如何定义

duplicated（）

的

keep

关键字。在本例中，它返回

['B'，'C']

，这不是我想要的。

仅用于筛选行：

df1 = df[df['col1'] == 'a']

out = df1.loc[df1['col2'].duplicated(keep=False), 'col2'].unique().tolist()
print (out)
['C']

另一个想法是，列和链都使用仅匹配

：

out = df.loc[df.duplicated(subset=['col1', 'col2'], keep=False) & 
             (df['col1'] == 'a'), 'col2'].unique().tolist()
print (out)
['C']

更通用的解决方案，使用和：

您可以按

col2

对

col1

进行分组，并计算

'a'

>>> s = df.col1.groupby(df.col2).sum().str.count('a').gt(1)
>>> s[s].index.values
array(['C'], dtype=object)

>>> s = df.col1.groupby(df.col2).sum().str.count('a').gt(1)
>>> s[s].index.values
array(['C'], dtype=object)