Python 在另一列具有指定值的情况下，是否有一种方法可以将pandas转换为groupby，然后计数unique？_Python_Pandas_Dataframe_Pandas Groupby_Unique

Python 在另一列具有指定值的情况下，是否有一种方法可以将pandas转换为groupby，然后计数unique？

python pandas dataframe

Python 在另一列具有指定值的情况下，是否有一种方法可以将pandas转换为groupby，然后计数unique？,python,pandas,dataframe,pandas-groupby,unique,Python,Pandas,Dataframe,Pandas Groupby,Unique,我有一个包含许多列的数据框架。为简单起见，假设列为“国家”、“时间桶”、“类别”和“id”。“类别”可以是“职员”或“学生” import pandas as pd data = {'country': ['A', 'A', 'A', 'B', 'B',], 'time_bucket': ['8', '8', '8', '8', '9'], 'category': ['staff', 'staff', 'student','student

我有一个包含许多列的数据框架。为简单起见，假设列为“国家”、“时间桶”、“类别”和“id”。“类别”可以是“职员”或“学生”

import pandas as pd
    data = {'country':  ['A', 'A', 'A', 'B', 'B',],
            'time_bucket': ['8', '8', '8', '8', '9'],
            'category': ['staff', 'staff', 'student','student','staff'],
            'id': ['101', '172', '122', '142', '132'],
            }
        
        df = pd.DataFrame (data, columns = ['country','time_bucket', 'category', 'id'])
df


country time_bucket category    id
0   A      8      staff        101
1   A      8      staff        172
2   A      8      student      122
3   B      8      student      142
4   B      9      staff        132

我想找出一个国家在特定时间间隔内的教职员工总数和学生总数，并将其添加为新列

我可以得到一个国家在特定时间间隔内的总人数：

df['persons_count'] = df.groupby(['time_bucket','country'])['id'].transform('nunique')

country time_bucket category    id  persons_count
0   A      8         staff      101    3
1   A      8         staff      172    3
2   A      8         student    122    3
3   B      8         student    142    1
4   B      9         staff      132    1

但是，我无法确定如何考虑“type”并将其添加到代码中

我想要这样的东西：

country time_bucket category    id  staff_count student_count
0   A     8          staff      101     2           1  
1   A     8          staff      172     2           1
2   A     8          student    122     2           1
3   B     8          student    142     0           1
4   B     9          staff      132     1           0

country time_bucket category    id  staff_count student_count
0   A     8          staff      101     2           1  
1   A     8          staff      172     2           1
2   A     8          student    122     2           1
3   A     8          student    122     2           1
4   B     8          student    142     0           1
5   B     9          staff      132     1           0

任何建议都将不胜感激

添加一个新示例，显示需要唯一的“id”计数

import pandas as pd
data = {'country':  ['A', 'A', 'A', 'A','B', 'B',],
                'time_bucket': ['8', '8', '8', '8', '8','9'],
                'category': ['staff', 'staff', 'student','student','student','staff'],
                'id': ['101', '172', '122', '122','142', '132'],
                }
        
df = pd.DataFrame (data, columns = ['country','time_bucket', 'category', 'id'])
df

country time_bucket category    id
0   A     8         staff       101
1   A     8         staff       172
2   A     8         student     122
3   A     8         student     122
4   B     8         student     142
5   B     9         staff       132

我想要这样的东西：

country time_bucket category    id  staff_count student_count
0   A     8          staff      101     2           1  
1   A     8          staff      172     2           1
2   A     8          student    122     2           1
3   B     8          student    142     0           1
4   B     9          staff      132     1           0

country time_bucket category    id  staff_count student_count
0   A     8          staff      101     2           1  
1   A     8          staff      172     2           1
2   A     8          student    122     2           1
3   A     8          student    122     2           1
4   B     8          student    142     0           1
5   B     9          staff      132     1           0

输出

                     category   staff   student
country time_bucket        id       
      A           8       101     2.0       0.0
                          122     0.0       1.0
                          172     2.0       0.0
      B           8       142     0.0       1.0
                  9       132     1.0       0.0

我们可以使用

groupby

操作和

apply

。

apply

将函数作为参数，该参数将为每个分组接收子数据帧。使用您提供的数据并按[country，time_bucket]分组，它将收到[A，8]的3行、[B，8]的1行和[B，9]的1行

要获取您请求的输出，请执行以下操作：

将熊猫作为pd导入
从收款进口柜台
数据={'country'：['A'，'A'，'A'，'B'，'B']，
‘时间桶’：[‘8’、‘8’、‘8’、‘8’、‘9’]，
‘类别’：[‘职员’、‘职员’、‘学生’、‘学生’、‘职员’]，
‘id’：[‘101’、‘172’、‘122’、‘142’、‘132’]，
}
df=pd.DataFrame（数据，列=['country'，'time\u bucket'，'category'，'id']）
def类别_计数器（世界其他地区）：
计数器=计数器（row.category.tolist（））
对于[‘职员’、‘学生’]中的k：
行[k+''U计数']=计数器[k]
返回行
df.groupby（['country'，'time\u bucket']）。应用（类别计数器）

输出：

  country time_bucket category   id  staff_count  student_count
0       A           8    staff  101            2              1
1       A           8    staff  172            2              1
2       A           8  student  122            2              1
3       B           8  student  142            0              1
4       B           9    staff  132            1              0

                     staff_count  student_count
country time_bucket
A       8                      2              1
B       8                      0              1
        9                      1              0

不返回重复数据的替代方案：

将熊猫作为pd导入
从收款进口柜台
数据={'country'：['A'，'A'，'A'，'B'，'B']，
‘时间桶’：[‘8’、‘8’、‘8’、‘8’、‘9’]，
‘类别’：[‘职员’、‘职员’、‘学生’、‘学生’、‘职员’]，
‘id’：[‘101’、‘172’、‘122’、‘142’、‘132’]，
}
df=pd.DataFrame（数据，列=['country'，'time\u bucket'，'category'，'id']）
def类别_计数器（世界其他地区）：
计数器=计数器（row.category.tolist（））
返回_data={}
对于[‘职员’、‘学生’]中的k：
返回_数据[k+“_计数”]=计数器[k]
返回pd.系列（返回数据）
df.groupby（['country'，'time\u bucket']）。应用（类别计数器）

输出：

  country time_bucket category   id  staff_count  student_count
0       A           8    staff  101            2              1
1       A           8    staff  172            2              1
2       A           8  student  122            2              1
3       B           8  student  142            0              1
4       B           9    staff  132            1              0

                     staff_count  student_count
country time_bucket
A       8                      2              1
B       8                      0              1
        9                      1              0

谢谢@Chris-在我将pivot加入到数据框架后，这主要解决了我的问题。“persons\u count”列没有计算我需要的总人数，所以我使用了我的原始代码，但除此之外，pivot工作得非常好！我把索引改成了['time_bucket'，'country']，它成功了！谢谢