Python 大熊猫一组一组找到中间值
df.head(10).到剪贴板(sep=';',index=True) 我有一个如上所述的dataframe,我有以下列描述Python 大熊猫一组一组找到中间值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,df.head(10).到剪贴板(sep=';',index=True) 我有一个如上所述的dataframe,我有以下列描述 • Id - the uuid of this delivery • PlanId - the uuid of the plan (the plan for deliveries of a given day) • PlanDate - the date of delivery • MinTime - the minimal time (seconds
• Id - the uuid of this delivery
• PlanId - the uuid of the plan (the plan for deliveries of a given day)
• PlanDate - the date of delivery
• MinTime - the minimal time (seconds from midnight) for delivering this delivery
• MaxTime - the maximal time (seconds from midnight) for delivering this delivery
• RouteId - the uuid of the route this delivery belongs to
• ETA - the estimated time for arrival of this delivery on this date (from the eta you can of course order the deliveries in a route)
• TTN - the time to next delivery in the route, i.e., at index 3 that would be the time distance between delivery index 3 and delivery index 4
• DTN - the distance to next delivery in the route.
我需要找到给定计划中每条路线的配送中位数
给定计划中每条路线行驶的中间距离
给定计划中每条路线的平均行驶时间
我该怎么做
我想知道这是否只是一个简单的中位数计算,你只是分组和汇总
我试过这样的方法来找出中间距离
Tx = df.groupby(by=['plan_id','route_id'], as_index=False)['dtn'].sum()
Tx.groupby(['plan_id','route_id'])['dtn'].median()
但是,我可能不确定这是否正确。以下是显示所需数字的方法:
#将数据帧子集为仅具有所需的计划id
sub_Tx=Tx[Tx['plan_id']=“869BB6FB-…]”
#给定计划中每条路线的交货中位数
sub_df=sub_Tx['计划id','路线id']]
sub_df[“计数交付”]=1
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。中位数()
#给定计划中每条路线行驶的中间距离
sub_df=sub_Tx['计划id',路线id',dtn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。中位数()
#给定计划中每条路线的平均行驶时间
sub_df=sub_Tx['计划id',路线id',ttn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。中位数()
祝你好运
更新:
因此,您可以根据计划id计算路线数字的中位数(nb交付量、距离和时间),如下所示:
#给定计划中每条路线的交付中值
sub_df=sub_Tx['计划id','路线id']]
sub_df[“计数交付”]=1
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df=sub_df['plan_id','count_deliveries']]。重命名(列={'count_deliveries':'media_deliveries'})
sub_df.groupby(by=['plan_id'],axis=0,as_index=False)
#给定计划中每条路线行驶的中间距离
sub_df=sub_Tx['计划id',路线id',dtn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df=sub_df[['plan_id','dtn']]。重命名(列={'dtn':'median_dtn'})
sub_df.groupby(by=['plan_id'],axis=0,as_index=False)
#给定计划中每条路线的平均行驶时间
sub_df=sub_Tx['计划id',路线id',ttn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df=sub_df[['plan_id','ttn']]。重命名(列={'ttn':'median_ttn'})
sub_df.groupby(by=['plan_id'],axis=0,as_index=False)
请以文本格式提供样本数据。无法通过图像数据再现示例。您好,感谢您的回复,如果您注意到中值的结果与求和的结果相同。这是正确的吗?您好,每个计划有不同的路线id吗?我刚刚编辑了我的帖子,计算每个计划的中位数。groupby(by=['plan\u id','route\u id','dtn'],axis=0,as\u index=False)。sum()sub\u df.groupby(by=['plan\u id',axis=0,as\u index=False)。中位数()这就是我在距离和时间上的做法吗?是的,似乎是这样,你得到了预期的结果吗?