在Python中从分组数据查找max

在Python中从分组数据查找max,python,pandas,max,Python,Pandas,Max,我有一个数据集,显示每天的用餐次数。列Week显示该日期所属月份的哪一周。请参见以下数据示例: Id. date. Meals Week 1 2020-02-23 1 4 1 2020-02-24 1 5 1 2020-02-25 2 5 1 2020-02-27

我有一个数据集,显示每天的用餐次数。列Week显示该日期所属月份的哪一周。请参见以下数据示例:

Id.        date.               Meals   Week
 1        2020-02-23            1         4
 1        2020-02-24            1         5
 1        2020-02-25            2         5
 1        2020-02-27            1         5
 1        2020-01-03            2         1
...         ...                ...       ...
 2        2020-03-04            3         2
 2        2020-03-05            4         2
 2        2020-03-06            3         2
 2        2020-03-07            1         2
 2        2020-03-08            2         2
我根据参与者ID和每周对数据进行分组,以获得每个参与者每周的平均用餐次数。请参见下文:

d = data[['Id','Week','Meals']].groupby(['Id', 'Week'],sort=False ).agg('mean')

                                 Meals
               ID          Week
                1           4   1.400000
                            5   1.333333
                            1   2.000000
                            2   1.250000
                            3   1.000000
                2           2   2.000000
                            3   2.142857
                            4   2.500000
                            5   2.500000
                3           2   2.555556
                            3   2.600000
                            4   1.833333
                            5   2.000000
                            1   2.000000
我的第一个问题:

  • 如果每个参与者的最大用餐次数在第一周或最后一周
  • Richie回答后,输出:

    print(df.head(50).to_dict('split')
    

    我的第二个问题(更新帖子后)是:

  • 研究的哪一周是最长的一周? 注意,该研究进行了1-4/5周。因此,输出结果如下所示,有一个额外的列名为(研究周):
  • 输出:

                                    Meals      max_week
    Id  Week    Week of the study       
    1    4            1               1          FALSE
         5            2               1          FALSE
         1            3               2          TRUE
         2            4               1          FALSE
         3            5               1          FALSE
    2    2            1               2          FALSE
         3            2               2          FALSE
         4            3               2          TRUE
         5            4               2          TRUE
    3    2            1               2          FALSE
         3            2               2          TRUE
         4            3               2          FALSE
         5            4               3          FALSE
         1            5               3          FALSE
    
    然后,我只想将ID、学习周和零食保存为真正的最长周,如下所示:

                                     Meals    max_week
    ProlificId  Week of the study       
        1              3                2       TRUE
        2              3                2       TRUE
        2              4                2       TRUE
        3              2                2       TRUE
    
    非常感谢你的帮助
    Shosho

    因此,您似乎只需要找到研究的
    ,以及每个
    Id的最大平均膳食数

    取以下样本

    import pandas as pd
    import numpy as np
    
    # sample data
    # please always provide a callable line of code with your data
    # you can get it with df.head(10).to_dict('split')
    # read more about this in https://stackoverflow.com/q/63163251/6692898
    # and https://stackoverflow.com/q/20109391/6692898
    np.random.seed(123) # include when creating random sample
    days, people = 18, 2
    data = pd.DataFrame({
        'Id': [i for _ in range(days) for i in range(1, people + 1)],
        'Date': pd.date_range('2020-02-23', periods=days).repeat(people).values,
        'Meals': np.random.randint(1, 5, days * people),
    })
    # data['Week_of_month'] = (data['Date'].dt.day - 1) // 7 + 1
    data['Week_of_the_study'] = data['Date'].dt.isocalendar().week
    data['Week_of_the_study'] -= data['Week_of_the_study'].min() - 1
    print(data)
    
        Id       Date  Meals  Week_of_the_study
    0    1 2020-02-23      3                  1
    1    2 2020-02-23      2                  1
    2    1 2020-02-24      3                  2
    3    2 2020-02-24      3                  2
    4    1 2020-02-25      1                  2
    5    2 2020-02-25      3                  2
    6    1 2020-02-26      3                  2
    7    2 2020-02-26      2                  2
    8    1 2020-02-27      4                  2
    9    2 2020-02-27      3                  2
    10   1 2020-02-28      4                  2
    11   2 2020-02-28      2                  2
    12   1 2020-02-29      3                  2
    13   2 2020-02-29      2                  2
    14   1 2020-03-01      1                  2
    15   2 2020-03-01      2                  2
    16   1 2020-03-02      3                  3
    17   2 2020-03-02      4                  3
    18   1 2020-03-03      2                  3
    19   2 2020-03-03      1                  3
    20   1 2020-03-04      3                  3
    21   2 2020-03-04      1                  3
    22   1 2020-03-05      4                  3
    23   2 2020-03-05      2                  3
    24   1 2020-03-06      4                  3
    25   2 2020-03-06      3                  3
    26   1 2020-03-07      2                  3
    27   2 2020-03-07      1                  3
    28   1 2020-03-08      1                  3
    29   2 2020-03-08      1                  3
    30   1 2020-03-09      1                  4
    31   2 2020-03-09      2                  4
    32   1 2020-03-10      4                  4
    33   2 2020-03-10      4                  4
    34   1 2020-03-11      3                  4
    35   2 2020-03-11      1                  4
    
    代码呢

    max_weeks = ( # get average meals per week & Id
        data.groupby(['Id', 'Week_of_the_study'])
        ['Meals'].mean()
    ).rename('max_meals')
    
    max_weeks = max_weeks.loc[ # filter only weeks with highest avg meals
        max_weeks == max_weeks.groupby('Id').transform(max)
    ].to_frame()
    
    输出

                          max_meals
    Id Week_of_the_study
    1  1                   3.000000
    2  2                   2.428571
    
                       Snacks  max_week
    ProlificId Week
    1          1     2.000000      True
               4     1.000000     False
               5     1.333333     False
    2          2     2.600000      True
    

    旧答案

    您可以在当前输出后使用
    groupby.transform
    ,以获得最大周数

    d = data.groupby(['ProlificId', 'Week'])['Snacks'].mean().to_frame()
    
    # most use cases want to separate by year/month as well, in that case use
    # data['date'] = pd.to_datetime(data['date'])
    # data['Year'] = data['date'].dt.year
    # data['Month'] = data['date'].dt.month
    # d = data.groupby(['ProlificId', 'Year', 'Month', 'Week'])['Snacks'].mean().to_frame()
    
    d['max_week'] = d == d.groupby('ProlificId').transform(max)
    
    输出

                          max_meals
    Id Week_of_the_study
    1  1                   3.000000
    2  2                   2.428571
    
                       Snacks  max_week
    ProlificId Week
    1          1     2.000000      True
               4     1.000000     False
               5     1.333333     False
    2          2     2.600000      True
    

    你没有解释你想要的预期输出的结构,看看答案是否有帮助,如果你需要帮助,请留下评论重塑或修改。我阅读了你作为答案发布的更新,请编辑你的问题,将其包括在这里,当你在做时,用
    df.head(50)获取数据帧的样本会很有帮助谢谢你,里奇,我已经更新了我原来的问题。更新在第2点。请包含
    打印输出(df.head(50).to_dict('split')
    或包含
    研究周的玩具数据框现在已包含在buddy中:)谢谢Richie,当我运行此行时:
    数据['Week_of_The_The_study']=data['Date'].dt.isocalendar().week
    我遇到了这个错误:
    AttributeError:'DatetimeProperties'对象没有属性'isocalendar'
    你知道吗?