Python 基于分组数据为列添加标签_Python_Pandas

Python 基于分组数据为列添加标签

python pandas

Python 基于分组数据为列添加标签,python,pandas,Python,Pandas,我试图创建一个列，该列由每个id的唯一值组成（每个id都有许多与其关联的行），如果该id的标记已被应答并与其任何行关联，则与该id关联的所有行都应被标记为已应答。如果与id关联的所有行都有一个未应答标记，则所有行都应标记为未应答（这是当前发生的情况）这是我写的代码：将numpy作为np导入 conds = [file.data__answered_at.isna(),file.data__answered_at.notna()] choices = ["not answered","answ

我试图创建一个列，该列由每个id的唯一值组成（每个id都有许多与其关联的行），如果该id的标记已被应答并与其任何行关联，则与该id关联的所有行都应被标记为已应答。如果与id关联的所有行都有一个未应答标记，则所有行都应标记为未应答（这是当前发生的情况）

这是我写的代码：

将numpy作为np导入

conds = [file.data__answered_at.isna(),file.data__answered_at.notna()]
choices = ["not answered","answered"]
file['call_status'] = np.select(conds,choices,default=np.nan)

 data__id   call_status       rank
  1            answered        1
  1          not_answered      2
  1            answered        3
  2          not_answered      1
  2             answered       2
  3          not_answered      1
  4            answered        1
  4          not_answered      2
  5          not_answered      1
  5          not_answered      2

在这种情况下，期望的结果是

   data__id   call_status       rank
  1            answered        1
  1            answered        2
  1            answered        3
  2            answered        1
  2            answered        2
  3          not_answered      1
  4            answered        1
  4            answered        2
  5          not_answered      1
  5          not_answered      2

我们可以在这里使用，并检查行数是否等于已回答的

然后我们使用有条件地填写

已回答

或

未回答

m = file.groupby('data__id')['call_status'].transform(lambda x: x.eq('answered').any())

file['call_status'] = np.where(m, 'answered', 'not_answered')

输出

  data__id   call_status  rank
0         1      answered     1
1         1      answered     2
2         1      answered     3
3         2      answered     1
4         2      answered     2
5         3  not_answered     1
6         4      answered     1
7         4      answered     2
8         5  not_answered     1
9         5  not_answered     2

与测试一起使用，每组至少有一个

答案，并通过以下方式设置值：
或者通过另一列过滤所有数据\u id
，并通过以下方式测试成员资格：

mask = df['call_status'].eq('answered').groupby(df['data__id']).transform('any')

mask = df['data__id'].isin(df.loc[df['call_status'].eq('answered'), 'data__id'].unique())

df.loc[mask, 'call_status'] = 'answered'
print (df)
   data__id   call_status  rank
0         1      answered     1
1         1      answered     2
2         1      answered     3
3         2      answered     1
4         2      answered     2
5         3  not_answered     1
6         4      answered     1
7         4      answered     2
8         5  not_answered     1
9         5  not_answered     2