Pandas 如何使用正则表达式按多个列分组

Pandas 如何使用正则表达式按多个列分组,pandas,pandas-groupby,Pandas,Pandas Groupby,下面是我拥有的数据集的表示,我想按学生ID和考试年份对其进行分组。由于原始数据包含完整的日期,我需要根据年份将其过滤掉 是的,我可以修改ExamDate列以获得年份或将年份提取到一个新列中,但是否有“groupby”、“multiindex”或类似的魔法可以让我在不引入新列或修改原始数据的情况下完成此操作 data = { 'ExamDate' : ['11/20/2019', '11/20/2019', '05/10/2019', '05/01/2020', '05/01/2020', '05

下面是我拥有的数据集的表示,我想按学生ID和考试年份对其进行分组。由于原始数据包含完整的日期,我需要根据年份将其过滤掉

是的,我可以修改ExamDate列以获得年份或将年份提取到一个新列中,但是否有“groupby”、“multiindex”或类似的魔法可以让我在不引入新列或修改原始数据的情况下完成此操作

data = { 'ExamDate' : ['11/20/2019', '11/20/2019', '05/10/2019', '05/01/2020', '05/01/2020', '05/10/2019'],
     'StudentId' : [45, 44, 45, 46, 45, 44],
     'Grade' : [ 70, 65, 90, 67, 81, 61]
   }

grouped=df.groupby(['ExamDate', 'StudentId'])

for grp, frame in grouped:
    #print(grp)
    print(frame)
电流输出如下所示

     ExamDate  StudentId  Grade
4  05/01/2020         45     81
     ExamDate  StudentId  Grade
3  05/01/2020         46     67
     ExamDate  StudentId  Grade
5  05/10/2019         44     61
     ExamDate  StudentId  Grade
2  05/10/2019         45     90
     ExamDate  StudentId  Grade
1  11/20/2019         44     65
     ExamDate  StudentId  Grade
0  11/20/2019         45     70
预期的输出是这样的

  ExamYear  StudentId  Grade
1     2019         44     65
5     2019         44     61
  ExamYear  StudentId  Grade
0     2019         45     70
2     2019         45     90
  ExamYear  StudentId  Grade
4     2020         45     81
  ExamYear  StudentId  Grade
3     2020         46     67
尝试:





请发布您的预期输出抱歉太晦涩了,我已经更新了当前和预期的输出。我对python知之甚少,但我的直觉是会有一个合适的方法,谢谢。
df['ExamDate'] = pd.to_datetime(df.ExamDate)
groups = df.groupby([df['ExamDate'].dt.year, 'StudentId'])
for grp, frame in groups:
    print(frame)
    ExamDate  StudentId  Grade
1 2019-11-20         44     65
5 2019-05-10         44     61
    ExamDate  StudentId  Grade
0 2019-11-20         45     70
2 2019-05-10         45     90
    ExamDate  StudentId  Grade
4 2020-05-01         45     81
    ExamDate  StudentId  Grade
3 2020-05-01         46     67
groups = df.groupby([df['ExamDate'].dt.year, 'StudentId'])
for grp, frame in groups:
    frame.loc[:,'ExamDate'] = frame['ExamDate'].dt.year
    print(frame)
   ExamDate  StudentId  Grade
1      2019         44     65
5      2019         44     61
   ExamDate  StudentId  Grade
0      2019         45     70
2      2019         45     90
   ExamDate  StudentId  Grade
4      2020         45     81
   ExamDate  StudentId  Grade
3      2020         46     67