Python 熊猫：获取数据帧的每日描述_Python_Pandas

Python 熊猫：获取数据帧的每日描述

python pandas

Python 熊猫：获取数据帧的每日描述,python,pandas,Python,Pandas,我有一个如下所示的数据帧： provider timestamp vehicle_id id 103107 a 2019-09-11 20:05:47+02:00 x 1192195 b 2019-09-11 00:02:46+02:00 y 434508 c 2019-09-11 00:32:39+02:00 z 530388 c

我有一个如下所示的数据帧：

        provider    timestamp                   vehicle_id
id          
103107  a           2019-09-11 20:05:47+02:00   x
1192195 b           2019-09-11 00:02:46+02:00   y
434508  c           2019-09-11 00:32:39+02:00   z
530388  c           2019-09-11 08:12:56+02:00   z
1773721 b           2019-09-11 20:02:55+02:00   w
...

我想得到一些关于每天不同车辆ID的统计数据。我有这个工具，可以手动执行

描述：
df.groupby（['provider'，df['timestamp'].dt.strftime（“%Y-%m-%d'））[['vehicle\u id']].nunique（）
：
如何整理数据，以便获得每天的最小值/最大值/平均值？我有点迷路了，非常感谢您的帮助。
试试这个：
aggregations = ['mean', 'min', 'max', 'std']
result = grouped_df.groupby('timestamp')[vehicle_id].agg(aggregations)

注意：您可能需要首先展平列索引：
grouped_df.columns = [col[1] if col[1] != '' else col[0] for col in grouped_df.columns]

尝试groupby（）.agg（）
：
注意：由于您只关心原始数据中的一列，因此只需在第一个groupby中传递一个系列，而不是数据帧，即
# note the number of [] around 'vehicle_id'
new_df = (df.groupby(['provider', 
                     df['timestamp'].dt.strftime('%Y-%m-%d')])
          ['vehicle_id'].nunique()
         )

然后，new\u df
是一个名为vehicle\u id
的系列，下一个命令就是
# note the difference before .agg
new_df.groupby('timestamp').agg({'min', 'max', 'mean'})

如果我正确理解您的问题，您需要做的就是：
df.groupby(['provider', df['timestamp'].dt.strftime('%Y-%m-%d')])[['vehicle_id']].nunique()\
  .groupby('timestamp')['vehicle_id'].describe()

在第一个groupby中，您将获得数据帧，其中包含唯一的车辆id
由提供商提供的编号和日期。对于提供的数据样本，它是：
                     vehicle_id
provider timestamp             
a        2019-09-11           1
b        2019-09-11           2
c        2019-09-11           1

第二天是每天的统计数据。因此，结果将是
            count      mean      std  min  25%  50%  75%  max
timestamp                                                    
2019-09-11    3.0  1.333333  0.57735  1.0  1.0  1.0  1.5  2.0

很抱歉，从示例数据中导出的输出是什么？
                     vehicle_id
provider timestamp             
a        2019-09-11           1
b        2019-09-11           2
c        2019-09-11           1

            count      mean      std  min  25%  50%  75%  max
timestamp                                                    
2019-09-11    3.0  1.333333  0.57735  1.0  1.0  1.0  1.5  2.0