Pandas 如何使用正则表达式按多个列分组
下面是我拥有的数据集的表示,我想按学生ID和考试年份对其进行分组。由于原始数据包含完整的日期,我需要根据年份将其过滤掉 是的,我可以修改ExamDate列以获得年份或将年份提取到一个新列中,但是否有“groupby”、“multiindex”或类似的魔法可以让我在不引入新列或修改原始数据的情况下完成此操作Pandas 如何使用正则表达式按多个列分组,pandas,pandas-groupby,Pandas,Pandas Groupby,下面是我拥有的数据集的表示,我想按学生ID和考试年份对其进行分组。由于原始数据包含完整的日期,我需要根据年份将其过滤掉 是的,我可以修改ExamDate列以获得年份或将年份提取到一个新列中,但是否有“groupby”、“multiindex”或类似的魔法可以让我在不引入新列或修改原始数据的情况下完成此操作 data = { 'ExamDate' : ['11/20/2019', '11/20/2019', '05/10/2019', '05/01/2020', '05/01/2020', '05
data = { 'ExamDate' : ['11/20/2019', '11/20/2019', '05/10/2019', '05/01/2020', '05/01/2020', '05/10/2019'],
'StudentId' : [45, 44, 45, 46, 45, 44],
'Grade' : [ 70, 65, 90, 67, 81, 61]
}
grouped=df.groupby(['ExamDate', 'StudentId'])
for grp, frame in grouped:
#print(grp)
print(frame)
电流输出如下所示
ExamDate StudentId Grade
4 05/01/2020 45 81
ExamDate StudentId Grade
3 05/01/2020 46 67
ExamDate StudentId Grade
5 05/10/2019 44 61
ExamDate StudentId Grade
2 05/10/2019 45 90
ExamDate StudentId Grade
1 11/20/2019 44 65
ExamDate StudentId Grade
0 11/20/2019 45 70
预期的输出是这样的
ExamYear StudentId Grade
1 2019 44 65
5 2019 44 61
ExamYear StudentId Grade
0 2019 45 70
2 2019 45 90
ExamYear StudentId Grade
4 2020 45 81
ExamYear StudentId Grade
3 2020 46 67
尝试:
请发布您的预期输出抱歉太晦涩了,我已经更新了当前和预期的输出。我对python知之甚少,但我的直觉是会有一个合适的方法,谢谢。
df['ExamDate'] = pd.to_datetime(df.ExamDate)
groups = df.groupby([df['ExamDate'].dt.year, 'StudentId'])
for grp, frame in groups:
print(frame)
ExamDate StudentId Grade
1 2019-11-20 44 65
5 2019-05-10 44 61
ExamDate StudentId Grade
0 2019-11-20 45 70
2 2019-05-10 45 90
ExamDate StudentId Grade
4 2020-05-01 45 81
ExamDate StudentId Grade
3 2020-05-01 46 67
groups = df.groupby([df['ExamDate'].dt.year, 'StudentId'])
for grp, frame in groups:
frame.loc[:,'ExamDate'] = frame['ExamDate'].dt.year
print(frame)
ExamDate StudentId Grade
1 2019 44 65
5 2019 44 61
ExamDate StudentId Grade
0 2019 45 70
2 2019 45 90
ExamDate StudentId Grade
4 2020 45 81
ExamDate StudentId Grade
3 2020 46 67