Python 基于分组数据为列添加标签
我试图创建一个列,该列由每个id的唯一值组成(每个id都有许多与其关联的行),如果该id的标记已被应答并与其任何行关联,则与该id关联的所有行都应被标记为已应答。如果与id关联的所有行都有一个未应答标记,则所有行都应标记为未应答(这是当前发生的情况) 这是我写的代码: 将numpy作为np导入Python 基于分组数据为列添加标签,python,pandas,Python,Pandas,我试图创建一个列,该列由每个id的唯一值组成(每个id都有许多与其关联的行),如果该id的标记已被应答并与其任何行关联,则与该id关联的所有行都应被标记为已应答。如果与id关联的所有行都有一个未应答标记,则所有行都应标记为未应答(这是当前发生的情况) 这是我写的代码: 将numpy作为np导入 conds = [file.data__answered_at.isna(),file.data__answered_at.notna()] choices = ["not answered","answ
conds = [file.data__answered_at.isna(),file.data__answered_at.notna()]
choices = ["not answered","answered"]
file['call_status'] = np.select(conds,choices,default=np.nan)
data__id call_status rank
1 answered 1
1 not_answered 2
1 answered 3
2 not_answered 1
2 answered 2
3 not_answered 1
4 answered 1
4 not_answered 2
5 not_answered 1
5 not_answered 2
在这种情况下,期望的结果是
data__id call_status rank
1 answered 1
1 answered 2
1 answered 3
2 answered 1
2 answered 2
3 not_answered 1
4 answered 1
4 answered 2
5 not_answered 1
5 not_answered 2
我们可以在这里使用,并检查行数是否等于已回答的
然后我们使用有条件地填写已回答
或未回答
m = file.groupby('data__id')['call_status'].transform(lambda x: x.eq('answered').any())
file['call_status'] = np.where(m, 'answered', 'not_answered')
输出
data__id call_status rank
0 1 answered 1
1 1 answered 2
2 1 answered 3
3 2 answered 1
4 2 answered 2
5 3 not_answered 1
6 4 answered 1
7 4 answered 2
8 5 not_answered 1
9 5 not_answered 2
与测试一起使用,每组至少有一个答案,并通过以下方式设置值:
或者通过另一列过滤所有数据\u id
,并通过以下方式测试成员资格:
mask = df['call_status'].eq('answered').groupby(df['data__id']).transform('any')
mask = df['data__id'].isin(df.loc[df['call_status'].eq('answered'), 'data__id'].unique())
df.loc[mask, 'call_status'] = 'answered'
print (df)
data__id call_status rank
0 1 answered 1
1 1 answered 2
2 1 answered 3
3 2 answered 1
4 2 answered 2
5 3 not_answered 1
6 4 answered 1
7 4 answered 2
8 5 not_answered 1
9 5 not_answered 2