Python 基于列值的Groupby和sum_Python_Pandas

Python 基于列值的Groupby和sum

python pandas

Python 基于列值的Groupby和sum,python,pandas,Python,Pandas,我有一个如下所示的数据帧： ID Date Category Parameter Color 1a1 2020-03-02 1 1 Red 1a1 2020-03-02 1 2 Green 1a1 2020-03-02 2 1 Red 1a1 2020-03-03 2 2 Green 1a1 2020-03-03 3

我有一个如下所示的数据帧：

ID  Date       Category  Parameter  Color
1a1 2020-03-02    1          1       Red
1a1 2020-03-02    1          2       Green 
1a1 2020-03-02    2          1       Red
1a1 2020-03-03    2          2       Green
1a1 2020-03-03    3          1       Red
1a1 2020-03-03    3          2       Green   
1a2 2020-03-02    1          1       Red
1a2 2020-03-02
1a2 2020-03-02

ID  Date       Category  Parameter  Color   count_red_category   count_red_parameter
1a1 2020-03-02    1          1       Red          1                     1
1a1 2020-03-02    1          2       Green        1                     1
1a1 2020-03-02    1          2       Red          1                     2
1a1 2020-03-02    2          1       Red          2                     2
1a1 2020-03-03    2          2       Green        0                     0
1a1 2020-03-03    3          1       Red          1                     1
1a1 2020-03-03    3          2       Green        1                     1   
1a2 2020-03-02    1          1       Red          1                     1
1a2 2020-03-02    1          1       Red          1                     1

对于给定的日期，我想知道每个ID有多少类别和参数被标记为红色，所以它会变成这样：

ID  Date       Category  Parameter  Color
1a1 2020-03-02    1          1       Red
1a1 2020-03-02    1          2       Green 
1a1 2020-03-02    2          1       Red
1a1 2020-03-03    2          2       Green
1a1 2020-03-03    3          1       Red
1a1 2020-03-03    3          2       Green   
1a2 2020-03-02    1          1       Red
1a2 2020-03-02
1a2 2020-03-02

ID  Date       Category  Parameter  Color   count_red_category   count_red_parameter
1a1 2020-03-02    1          1       Red          1                     1
1a1 2020-03-02    1          2       Green        1                     1
1a1 2020-03-02    1          2       Red          1                     2
1a1 2020-03-02    2          1       Red          2                     2
1a1 2020-03-03    2          2       Green        0                     0
1a1 2020-03-03    3          1       Red          1                     1
1a1 2020-03-03    3          2       Green        1                     1   
1a2 2020-03-02    1          1       Red          1                     1
1a2 2020-03-02    1          1       Red          1                     1

基本上：

在每个日期时间，类别和参数都标记为红色/绿色
每个类别可以有多个参数
对于每个datetime，我需要到那时为止不同类别的数量（该ID有多少个不同类别，日期标记为红色）
参数也一样

你知道做这件事最好的方法是什么吗

关于

我可能会误解，但首先，您只关心红色值：

 tmpdf = df[df.Color=="Red"]

然后，您希望按Id、日期进行分组，并查找不同类别的数量：

 tmpdf.groupby(['ID', 'Date']).Category.nunique()

当然，您可以将这两行组合起来：

 newdf=df[df.Color=="Red"].groupby(['ID', 'Date']).Category.nunique()

如果要保留不带红色的日期/ID（为0），则：

谢谢你的回复！我考虑过这个解决方案，但我想保留0个类别/参数标记为非的日期-red@Lu我不知道你是什么意思。@LuísCosta你是说你想把它们放在名单上，日期和身份证？