在Python中从分组数据查找max
我有一个数据集,显示每天的用餐次数。列Week显示该日期所属月份的哪一周。请参见以下数据示例:在Python中从分组数据查找max,python,pandas,max,Python,Pandas,Max,我有一个数据集,显示每天的用餐次数。列Week显示该日期所属月份的哪一周。请参见以下数据示例: Id. date. Meals Week 1 2020-02-23 1 4 1 2020-02-24 1 5 1 2020-02-25 2 5 1 2020-02-27
Id. date. Meals Week
1 2020-02-23 1 4
1 2020-02-24 1 5
1 2020-02-25 2 5
1 2020-02-27 1 5
1 2020-01-03 2 1
... ... ... ...
2 2020-03-04 3 2
2 2020-03-05 4 2
2 2020-03-06 3 2
2 2020-03-07 1 2
2 2020-03-08 2 2
我根据参与者ID和每周对数据进行分组,以获得每个参与者每周的平均用餐次数。请参见下文:
d = data[['Id','Week','Meals']].groupby(['Id', 'Week'],sort=False ).agg('mean')
Meals
ID Week
1 4 1.400000
5 1.333333
1 2.000000
2 1.250000
3 1.000000
2 2 2.000000
3 2.142857
4 2.500000
5 2.500000
3 2 2.555556
3 2.600000
4 1.833333
5 2.000000
1 2.000000
我的第一个问题:
print(df.head(50).to_dict('split')
是
我的第二个问题(更新帖子后)是:
Meals max_week
Id Week Week of the study
1 4 1 1 FALSE
5 2 1 FALSE
1 3 2 TRUE
2 4 1 FALSE
3 5 1 FALSE
2 2 1 2 FALSE
3 2 2 FALSE
4 3 2 TRUE
5 4 2 TRUE
3 2 1 2 FALSE
3 2 2 TRUE
4 3 2 FALSE
5 4 3 FALSE
1 5 3 FALSE
然后,我只想将ID、学习周和零食保存为真正的最长周,如下所示:
Meals max_week
ProlificId Week of the study
1 3 2 TRUE
2 3 2 TRUE
2 4 2 TRUE
3 2 2 TRUE
非常感谢你的帮助
Shosho因此,您似乎只需要找到研究的
周
,以及每个Id的最大平均膳食数
取以下样本
import pandas as pd
import numpy as np
# sample data
# please always provide a callable line of code with your data
# you can get it with df.head(10).to_dict('split')
# read more about this in https://stackoverflow.com/q/63163251/6692898
# and https://stackoverflow.com/q/20109391/6692898
np.random.seed(123) # include when creating random sample
days, people = 18, 2
data = pd.DataFrame({
'Id': [i for _ in range(days) for i in range(1, people + 1)],
'Date': pd.date_range('2020-02-23', periods=days).repeat(people).values,
'Meals': np.random.randint(1, 5, days * people),
})
# data['Week_of_month'] = (data['Date'].dt.day - 1) // 7 + 1
data['Week_of_the_study'] = data['Date'].dt.isocalendar().week
data['Week_of_the_study'] -= data['Week_of_the_study'].min() - 1
print(data)
Id Date Meals Week_of_the_study
0 1 2020-02-23 3 1
1 2 2020-02-23 2 1
2 1 2020-02-24 3 2
3 2 2020-02-24 3 2
4 1 2020-02-25 1 2
5 2 2020-02-25 3 2
6 1 2020-02-26 3 2
7 2 2020-02-26 2 2
8 1 2020-02-27 4 2
9 2 2020-02-27 3 2
10 1 2020-02-28 4 2
11 2 2020-02-28 2 2
12 1 2020-02-29 3 2
13 2 2020-02-29 2 2
14 1 2020-03-01 1 2
15 2 2020-03-01 2 2
16 1 2020-03-02 3 3
17 2 2020-03-02 4 3
18 1 2020-03-03 2 3
19 2 2020-03-03 1 3
20 1 2020-03-04 3 3
21 2 2020-03-04 1 3
22 1 2020-03-05 4 3
23 2 2020-03-05 2 3
24 1 2020-03-06 4 3
25 2 2020-03-06 3 3
26 1 2020-03-07 2 3
27 2 2020-03-07 1 3
28 1 2020-03-08 1 3
29 2 2020-03-08 1 3
30 1 2020-03-09 1 4
31 2 2020-03-09 2 4
32 1 2020-03-10 4 4
33 2 2020-03-10 4 4
34 1 2020-03-11 3 4
35 2 2020-03-11 1 4
代码呢
max_weeks = ( # get average meals per week & Id
data.groupby(['Id', 'Week_of_the_study'])
['Meals'].mean()
).rename('max_meals')
max_weeks = max_weeks.loc[ # filter only weeks with highest avg meals
max_weeks == max_weeks.groupby('Id').transform(max)
].to_frame()
输出
max_meals
Id Week_of_the_study
1 1 3.000000
2 2 2.428571
Snacks max_week
ProlificId Week
1 1 2.000000 True
4 1.000000 False
5 1.333333 False
2 2 2.600000 True
旧答案 您可以在当前输出后使用
groupby.transform
,以获得最大周数
d = data.groupby(['ProlificId', 'Week'])['Snacks'].mean().to_frame()
# most use cases want to separate by year/month as well, in that case use
# data['date'] = pd.to_datetime(data['date'])
# data['Year'] = data['date'].dt.year
# data['Month'] = data['date'].dt.month
# d = data.groupby(['ProlificId', 'Year', 'Month', 'Week'])['Snacks'].mean().to_frame()
d['max_week'] = d == d.groupby('ProlificId').transform(max)
输出
max_meals
Id Week_of_the_study
1 1 3.000000
2 2 2.428571
Snacks max_week
ProlificId Week
1 1 2.000000 True
4 1.000000 False
5 1.333333 False
2 2 2.600000 True
你没有解释你想要的预期输出的结构,看看答案是否有帮助,如果你需要帮助,请留下评论重塑或修改。我阅读了你作为答案发布的更新,请编辑你的问题,将其包括在这里,当你在做时,用
df.head(50)获取数据帧的样本会很有帮助谢谢你,里奇,我已经更新了我原来的问题。更新在第2点。请包含打印输出(df.head(50).to_dict('split')
或包含研究周的玩具数据框现在已包含在buddy中:)谢谢Richie,当我运行此行时:数据['Week_of_The_The_study']=data['Date'].dt.isocalendar().week
我遇到了这个错误:AttributeError:'DatetimeProperties'对象没有属性'isocalendar'
你知道吗?