Python 检查Groupby对象中系列的唯一性

Python 检查Groupby对象中系列的唯一性,python,python-3.x,pandas,dataframe,pandas-groupby,Python,Python 3.x,Pandas,Dataframe,Pandas Groupby,我正在努力学习如何获得transform()以返回我想要的结果。我想检查每个组中的“missed”在给定组中是否是唯一的 考虑以下因素: df = pd.DataFrame({'key': [1, 1, 2, 2, 3, 3, 2, 4], 'type': ['correct', 'incorrect', 'missed', 'incorrect', 'missed', 'missed', 'correct', 'pass']}) df key type 0 1 correct

我正在努力学习如何获得
transform()
以返回我想要的结果。我想检查每个组中的“missed”在给定组中是否是唯一的

考虑以下因素:

df = pd.DataFrame({'key': [1, 1, 2, 2, 3, 3, 2, 4], 'type': ['correct', 'incorrect', 'missed', 'incorrect', 'missed', 'missed', 'correct', 'pass']})
df

  key   type
0   1   correct
1   1   incorrect
2   2   missed
3   2   incorrect
4   3   missed
5   3   missed
6   2   correct
7   4   pass

我试图让原始数据帧看起来像这样。其中
only_missed
yes
,如果
missed
为组中的唯一类型

    key type    only_missed
0   1   correct     no
1   1   incorrect   no
2   2   missed      no
3   2   incorrect   no
4   3   missed      yes
5   3   missed      yes
6   2   correct     no
7   4   pass        pass
我尝试了此操作,但输出出乎意料:

a = ['correct', 'incorrect']
m = ['missed']
df['only_missed'] = df.groupby('key')['type'].transform(lambda x: 'no' if all(x.isin(a)) else ('yes' if all(x.isin(m)) else 'pass'))
df
   key  type    only_missed
0   1   correct     no
1   1   incorrect   no
2   2   missed      pass
3   2   incorrect   pass
4   3   missed      yes
5   3   missed      yes
6   2   correct     pass
7   4   pass        pass
这一次真的让我很困惑,因为我在这里经过了多次迭代,试图弄清楚到底发生了什么

非常感谢您的帮助。

请尝试:

df.groupby('key')['type'].transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))
输出:

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7    False
Name: type, dtype: bool
0    False
1    False
2    False
3    False
4     True
5     True
6    False
7     pass
Name: type, dtype: object
0      No
1      No
2      No
3      No
4     Yes
5     Yes
6      No
7    pass
Name: type, dtype: object
   key       type only_misses
0    1    correct          No
1    1  incorrect          No
2    2     missed          No
3    2  incorrect          No
4    3     missed         Yes
5    3     missed         Yes
6    2    correct          No
7    4       pass        pass
而且,你可以掩盖“通行证”:

df.groupby('key')['type']\
  .transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
  .mask(df.type == 'pass','pass')
输出:

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7    False
Name: type, dtype: bool
0    False
1    False
2    False
3    False
4     True
5     True
6    False
7     pass
Name: type, dtype: object
0      No
1      No
2      No
3      No
4     Yes
5     Yes
6      No
7    pass
Name: type, dtype: object
   key       type only_misses
0    1    correct          No
1    1  incorrect          No
2    2     missed          No
3    2  incorrect          No
4    3     missed         Yes
5    3     missed         Yes
6    2    correct          No
7    4       pass        pass
并且,将真/假替换为是/否:

df.groupby('key')['type']\
  .transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
  .replace({False:'No',True:'Yes'})\
  .mask(df.type == 'pass','pass')
输出:

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7    False
Name: type, dtype: bool
0    False
1    False
2    False
3    False
4     True
5     True
6    False
7     pass
Name: type, dtype: object
0      No
1      No
2      No
3      No
4     Yes
5     Yes
6      No
7    pass
Name: type, dtype: object
   key       type only_misses
0    1    correct          No
1    1  incorrect          No
2    2     missed          No
3    2  incorrect          No
4    3     missed         Yes
5    3     missed         Yes
6    2    correct          No
7    4       pass        pass
分配给数据帧列:

df['only_misses'] = df.groupby('key')['type']\
                      .transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
                      .replace({False:'No',True:'Yes'})\
                      .mask(df.type == 'pass','pass')
df
输出:

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7    False
Name: type, dtype: bool
0    False
1    False
2    False
3    False
4     True
5     True
6    False
7     pass
Name: type, dtype: object
0      No
1      No
2      No
3      No
4     Yes
5     Yes
6      No
7    pass
Name: type, dtype: object
   key       type only_misses
0    1    correct          No
1    1  incorrect          No
2    2     missed          No
3    2  incorrect          No
4    3     missed         Yes
5    3     missed         Yes
6    2    correct          No
7    4       pass        pass

一种方法是使用布尔值并将它们相加以创建一个分类:

In [11]: a = pd.Series(df.type.str.match('correct|incorrect').values, df.key).groupby(level=0).transform('all')

In [12]: m = pd.Series((df.type == 'missed').values, df.key).groupby(level=0).transform('all')

In [13]: pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])
Out[13]:
[no, no, pass, pass, yes, yes, pass, pass]
Categories (3, object): [pass, no, yes]

In [14]: df["only_missed"] = pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])

In [15]: df
Out[15]:
   key       type only_missed
0    1    correct          no
1    1  incorrect          no
2    2     missed        pass
3    2  incorrect        pass
4    3     missed         yes
5    3     missed         yes
6    2    correct        pass
7    4       pass        pass
使用
.values
(以避免重新编制索引)会让人感觉有点不舒服,但应该非常有效


再次查看,这是“不正确”的输出,但我将把它留在那里,因为它本质上是相同的。为了获得正确的答案,您应该查看所有“通行证”:


我想我是糊涂了。您的预期输出是什么?1“通过”的数据帧或4“通过”的数据帧?这就成功了。我在逻辑上落后了,没有正确使用
a.all()
。和@RamGhadiyaram说的一样——我不确定为什么这对我来说不是很明显。谢谢