Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 大熊猫一组一组找到中间值_Python_Pandas_Pandas Groupby - Fatal编程技术网

Python 大熊猫一组一组找到中间值

Python 大熊猫一组一组找到中间值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,df.head(10).到剪贴板(sep=';',index=True) 我有一个如上所述的dataframe,我有以下列描述 • Id - the uuid of this delivery • PlanId - the uuid of the plan (the plan for deliveries of a given day) • PlanDate - the date of delivery • MinTime - the minimal time (seconds

df.head(10).到剪贴板(sep=';',index=True)

我有一个如上所述的dataframe,我有以下列描述

•   Id - the uuid of this delivery
•   PlanId - the uuid of the plan (the plan for deliveries of a given day)
 •  PlanDate - the date of delivery

•   MinTime - the minimal time (seconds from midnight) for delivering this delivery
•   MaxTime - the maximal time (seconds from midnight) for delivering this delivery
•   RouteId - the uuid of the route this delivery belongs to
•   ETA - the estimated time for arrival of this delivery on this date (from the eta you can of course order the deliveries in a route)
•   TTN - the time to next delivery in the route, i.e., at index 3 that would be the time distance between delivery index 3 and delivery index 4
•   DTN - the distance to next delivery in the route.
我需要找到给定计划中每条路线的配送中位数

给定计划中每条路线行驶的中间距离

给定计划中每条路线的平均行驶时间

我该怎么做

我想知道这是否只是一个简单的中位数计算,你只是分组和汇总 我试过这样的方法来找出中间距离

Tx = df.groupby(by=['plan_id','route_id'], as_index=False)['dtn'].sum()


 Tx.groupby(['plan_id','route_id'])['dtn'].median()

但是,我可能不确定这是否正确。

以下是显示所需数字的方法:

#将数据帧子集为仅具有所需的计划id
sub_Tx=Tx[Tx['plan_id']=“869BB6FB-…]”
#给定计划中每条路线的交货中位数
sub_df=sub_Tx['计划id','路线id']]
sub_df[“计数交付”]=1
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。中位数()
#给定计划中每条路线行驶的中间距离
sub_df=sub_Tx['计划id',路线id',dtn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。中位数()
#给定计划中每条路线的平均行驶时间
sub_df=sub_Tx['计划id',路线id',ttn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。中位数()
祝你好运

更新:

因此,您可以根据计划id计算路线数字的中位数(nb交付量、距离和时间),如下所示:

#给定计划中每条路线的交付中值
sub_df=sub_Tx['计划id','路线id']]
sub_df[“计数交付”]=1
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df=sub_df['plan_id','count_deliveries']]。重命名(列={'count_deliveries':'media_deliveries'})
sub_df.groupby(by=['plan_id'],axis=0,as_index=False)
#给定计划中每条路线行驶的中间距离
sub_df=sub_Tx['计划id',路线id',dtn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df=sub_df[['plan_id','dtn']]。重命名(列={'dtn':'median_dtn'})
sub_df.groupby(by=['plan_id'],axis=0,as_index=False)
#给定计划中每条路线的平均行驶时间
sub_df=sub_Tx['计划id',路线id',ttn']
sub_df=sub_df.groupby(by=['plan_id','route_id',],axis=0,as_index=False)。sum()
sub_df=sub_df[['plan_id','ttn']]。重命名(列={'ttn':'median_ttn'})
sub_df.groupby(by=['plan_id'],axis=0,as_index=False)

请以文本格式提供样本数据。无法通过图像数据再现示例。您好,感谢您的回复,如果您注意到中值的结果与求和的结果相同。这是正确的吗?您好,每个计划有不同的路线id吗?我刚刚编辑了我的帖子,计算每个计划的中位数。groupby(by=['plan\u id','route\u id','dtn'],axis=0,as\u index=False)。sum()sub\u df.groupby(by=['plan\u id',axis=0,as\u index=False)。中位数()这就是我在距离和时间上的做法吗?是的,似乎是这样,你得到了预期的结果吗?