Python 在数据帧中查找工作日组的平均值
我的数据集如下所示:Python 在数据帧中查找工作日组的平均值,python,pandas,datetime,group-by,Python,Pandas,Datetime,Group By,我的数据集如下所示: tripduration starttime User Type 0 732 7/1/2015 00:00:03 Subscriber 1 322 7/1/2015 00:00:06 Subscriber 2 790 7/1/2015 00:00:17 Subscriber 3 1228
tripduration starttime User Type
0 732 7/1/2015 00:00:03 Subscriber
1 322 7/1/2015 00:00:06 Subscriber
2 790 7/1/2015 00:00:17 Subscriber
3 1228 7/1/2015 00:00:23 Subscriber
4 1383 7/1/2015 00:00:44 Subscriber
5 603 7/1/2015 00:01:00 Subscriber
6 520 7/1/2015 00:01:03 Subscriber
7 289 7/1/2015 00:01:06 Subscriber
8 1771 7/1/2015 00:01:25 Customer
9 813 7/1/2015 00:01:41 Subscriber
10 1735 7/1/2015 00:01:50 Customer
11 832 7/1/2015 00:01:58 Subscriber
12 1210 7/1/2015 00:02:06 Subscriber
13 746 7/1/2015 00:02:07 Subscriber
14 749 7/1/2015 00:02:26 Subscriber
15 463 7/1/2015 00:02:26 Subscriber
16 331 7/1/2015 00:02:35 Subscriber
17 951 7/1/2015 00:02:43 Customer
18 1352 7/1/2015 00:02:47 Customer
19 275 7/1/2015 00:02:47 Subscriber
20 199 7/1/2015 00:03:05 Subscriber
21 383 7/1/2015 00:03:16 Customer
22 4210 7/1/2015 00:03:27 Subscriber
23 584 7/1/2015 00:03:34 Subscriber
24 735 7/1/2015 00:03:48 Subscriber
25 827 7/1/2015 00:03:56 Subscriber
26 677 7/1/2015 00:03:57 Subscriber
27 2371 7/1/2015 00:03:58 Customer
28 666 7/1/2015 00:04:03 Subscriber
29 999 7/1/2015 00:04:17 Subscriber
... ... ... ...
1085646 243 7/31/2015 23:57:25 Subscriber
1085647 1378 7/31/2015 23:57:29 Customer
1085648 230 7/31/2015 23:57:32 Subscriber
1085649 1669 7/31/2015 23:57:33 Subscriber
1085650 493 7/31/2015 23:57:44 Subscriber
1085651 822 7/31/2015 23:57:54 Subscriber
1085652 617 7/31/2015 23:58:03 Subscriber
1085653 349 7/31/2015 23:58:08 Subscriber
1085654 818 7/31/2015 23:58:12 Customer
1085655 2062 7/31/2015 23:58:15 Subscriber
1085656 945 7/31/2015 23:58:18 Customer
1085657 346 7/31/2015 23:58:24 Subscriber
1085658 399 7/31/2015 23:58:27 Subscriber
1085659 641 7/31/2015 23:58:42 Subscriber
1085660 1872 7/31/2015 23:58:43 Subscriber
1085661 12065 7/31/2015 23:58:51 Customer
1085662 265 7/31/2015 23:58:53 Subscriber
1085663 936 7/31/2015 23:58:58 Subscriber
1085664 395 7/31/2015 23:59:04 Subscriber
1085665 238 7/31/2015 23:59:10 Subscriber
1085666 551 7/31/2015 23:59:24 Subscriber
1085667 423 7/31/2015 23:59:23 Customer
1085668 1623 7/31/2015 23:59:24 Subscriber
1085669 1632 7/31/2015 23:59:24 Subscriber
1085670 305 7/31/2015 23:59:38 Subscriber
1085671 275 7/31/2015 23:59:40 Subscriber
1085672 530 7/31/2015 23:59:41 Subscriber
1085673 273 7/31/2015 23:59:42 Customer
1085674 1273 7/31/2015 23:59:56 Subscriber
1085675 1667 7/31/2015 23:59:59 Subscriber
我的问题
订户在任何工作日(周一至周五)的平均行程时间是多少
我的代码
函数a4()
应返回平均值(浮点数为两位小数):
我被困在这里是为了得到工作日(周一至周五)来计算tripduration
的平均值。
我试图使用parser.parse(df1['starttime')
解析starttime
,但出现错误:
TypeError: Parser must be a string or character stream, not Series
获取工作日平均值的正确方法是什么?我认为您需要首先转换列
starttime
然后过滤
如果需要为所有workday
使用一个标量值,则使用loc
选择包含mean
的列:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
m = (rides['starttime'].dt.dayofweek < 5) & (rides['User Type'] == 'Subscriber')
return round(rides.loc[m, 'tripduration'].mean(), 2)
print (a4(rides))
825.33
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
return df1.groupby(df1['starttime'].dt.dayofweek)['tripduration'].mean().round(2)
print (a4(rides))
starttime
2 840.96
4 809.71
Name: tripduration, dtype: float64
如果不需要天数,请使用:
def a4(游乐设施):
乘坐次数['starttime']=pd.to_datetime(乘坐次数['starttime'])
df1=rides[(rides['User Type']=='Subscriber')和(rides['starttime'].dt.dayofweek<5)]
返回df1.groupby(df1['starttime'].dt.weekday_name]['tripduration'].mean().round(2)
印刷品(a4(游乐设施))
开始时间
星期五809.71
星期三840.96
名称:tripduration,数据类型:float64
使用布尔索引和groupby
dayofweek
进行过滤,以计算平均值
df = df[(df.starttime.dt.dayofweek < 5) & df['User Type'].eq('Subscriber')]
g = np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(), 2)
df=df[(df.starttime.dt.dayofweek<5)&df['User Type'].eq('Subscriber')]
g=np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(),2)
df.groupby(df.starttime.dt.dayofweek).tripduration.mean()
如果您的问题得到了回答,您可以。
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
return df1.groupby(df1['starttime'].dt.weekday_name)['tripduration'].mean().round(2)
print (a4(rides))
starttime
Friday 809.71
Wednesday 840.96
Name: tripduration, dtype: float64
df = pd.read_csv(...., parse_dates='starttime')
df = df[(df.starttime.dt.dayofweek < 5) & df['User Type'].eq('Subscriber')]
g = np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(), 2)