Python 检查Groupby对象中系列的唯一性
我正在努力学习如何获得Python 检查Groupby对象中系列的唯一性,python,python-3.x,pandas,dataframe,pandas-groupby,Python,Python 3.x,Pandas,Dataframe,Pandas Groupby,我正在努力学习如何获得transform()以返回我想要的结果。我想检查每个组中的“missed”在给定组中是否是唯一的 考虑以下因素: df = pd.DataFrame({'key': [1, 1, 2, 2, 3, 3, 2, 4], 'type': ['correct', 'incorrect', 'missed', 'incorrect', 'missed', 'missed', 'correct', 'pass']}) df key type 0 1 correct
transform()
以返回我想要的结果。我想检查每个组中的“missed”在给定组中是否是唯一的
考虑以下因素:
df = pd.DataFrame({'key': [1, 1, 2, 2, 3, 3, 2, 4], 'type': ['correct', 'incorrect', 'missed', 'incorrect', 'missed', 'missed', 'correct', 'pass']})
df
key type
0 1 correct
1 1 incorrect
2 2 missed
3 2 incorrect
4 3 missed
5 3 missed
6 2 correct
7 4 pass
我试图让原始数据帧看起来像这样。其中only_missed
为yes
,如果missed
为组中的唯一类型
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed no
3 2 incorrect no
4 3 missed yes
5 3 missed yes
6 2 correct no
7 4 pass pass
我尝试了此操作,但输出出乎意料:
a = ['correct', 'incorrect']
m = ['missed']
df['only_missed'] = df.groupby('key')['type'].transform(lambda x: 'no' if all(x.isin(a)) else ('yes' if all(x.isin(m)) else 'pass'))
df
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed pass
3 2 incorrect pass
4 3 missed yes
5 3 missed yes
6 2 correct pass
7 4 pass pass
这一次真的让我很困惑,因为我在这里经过了多次迭代,试图弄清楚到底发生了什么
非常感谢您的帮助。请尝试:
df.groupby('key')['type'].transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))
输出:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
Name: type, dtype: bool
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 pass
Name: type, dtype: object
0 No
1 No
2 No
3 No
4 Yes
5 Yes
6 No
7 pass
Name: type, dtype: object
key type only_misses
0 1 correct No
1 1 incorrect No
2 2 missed No
3 2 incorrect No
4 3 missed Yes
5 3 missed Yes
6 2 correct No
7 4 pass pass
而且,你可以掩盖“通行证”:
df.groupby('key')['type']\
.transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
.mask(df.type == 'pass','pass')
输出:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
Name: type, dtype: bool
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 pass
Name: type, dtype: object
0 No
1 No
2 No
3 No
4 Yes
5 Yes
6 No
7 pass
Name: type, dtype: object
key type only_misses
0 1 correct No
1 1 incorrect No
2 2 missed No
3 2 incorrect No
4 3 missed Yes
5 3 missed Yes
6 2 correct No
7 4 pass pass
并且,将真/假替换为是/否:
df.groupby('key')['type']\
.transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
.replace({False:'No',True:'Yes'})\
.mask(df.type == 'pass','pass')
输出:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
Name: type, dtype: bool
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 pass
Name: type, dtype: object
0 No
1 No
2 No
3 No
4 Yes
5 Yes
6 No
7 pass
Name: type, dtype: object
key type only_misses
0 1 correct No
1 1 incorrect No
2 2 missed No
3 2 incorrect No
4 3 missed Yes
5 3 missed Yes
6 2 correct No
7 4 pass pass
分配给数据帧列:
df['only_misses'] = df.groupby('key')['type']\
.transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
.replace({False:'No',True:'Yes'})\
.mask(df.type == 'pass','pass')
df
输出:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
Name: type, dtype: bool
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 pass
Name: type, dtype: object
0 No
1 No
2 No
3 No
4 Yes
5 Yes
6 No
7 pass
Name: type, dtype: object
key type only_misses
0 1 correct No
1 1 incorrect No
2 2 missed No
3 2 incorrect No
4 3 missed Yes
5 3 missed Yes
6 2 correct No
7 4 pass pass
一种方法是使用布尔值并将它们相加以创建一个分类:
In [11]: a = pd.Series(df.type.str.match('correct|incorrect').values, df.key).groupby(level=0).transform('all')
In [12]: m = pd.Series((df.type == 'missed').values, df.key).groupby(level=0).transform('all')
In [13]: pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])
Out[13]:
[no, no, pass, pass, yes, yes, pass, pass]
Categories (3, object): [pass, no, yes]
In [14]: df["only_missed"] = pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])
In [15]: df
Out[15]:
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed pass
3 2 incorrect pass
4 3 missed yes
5 3 missed yes
6 2 correct pass
7 4 pass pass
使用.values
(以避免重新编制索引)会让人感觉有点不舒服,但应该非常有效
再次查看,这是“不正确”的输出,但我将把它留在那里,因为它本质上是相同的。为了获得正确的答案,您应该查看所有“通行证”:
我想我是糊涂了。您的预期输出是什么?1“通过”的数据帧或4“通过”的数据帧?这就成功了。我在逻辑上落后了,没有正确使用
a.all()
。和@RamGhadiyaram说的一样——我不确定为什么这对我来说不是很明显。谢谢