Python 重采样数据集_Python_Pandas

Python 重采样数据集

python pandas

Python 重采样数据集,python,pandas,Python,Pandas,我正在研究Metro州际交通量数据集（可在此处找到：），但我无法对数据集进行重采样以显示每天的平均交通量，而不是每小时的平均交通量 metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv') metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S') metro.set_index('date_time', inplace=True

我正在研究Metro州际交通量数据集（可在此处找到：），但我无法对数据集进行重采样以显示每天的平均交通量，而不是每小时的平均交通量

metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
metro.set_index('date_time', inplace=True, drop=True)
metro.resample('1Y').mean()

这就是我得到的：

                    holiday    temp  ...     weather_description  traffic_volume
date_time                            ...                                        
2012-10-02 09:00:00    None  288.28  ...        scattered clouds            5545
2012-10-02 10:00:00    None  289.36  ...           broken clouds            4516
2012-10-02 11:00:00    None  289.58  ...         overcast clouds            4767
2012-10-02 12:00:00    None  290.13  ...         overcast clouds            5026
2012-10-02 13:00:00    None  291.14  ...           broken clouds            4918
...                     ...     ...  ...                     ...             ...
2018-09-30 19:00:00    None  283.45  ...           broken clouds            3543
2018-09-30 20:00:00    None  282.76  ...         overcast clouds            2781
2018-09-30 21:00:00    None  282.73  ...  proximity thunderstorm            2159
2018-09-30 22:00:00    None  282.09  ...         overcast clouds            1450
2018-09-30 23:00:00    None  282.12  ...         overcast clouds             954

[48204 rows x 8 columns]

你知道怎么解决吗

编辑：

此外，我还检查了pandas参考以进行重采样（），并执行了以下示例代码：

d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
          'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2018',
                                    periods=8,
                                    freq='W')
df
   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
df.resample('M', on='week_starting').mean()
               price  volume
week_starting
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

但是，对我来说，重新采样前后的结果是相同的。

一年不是固定的时间段：有些年份有365天，有些年份有366天。您可以使用

groupby

：

metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')

# extract the yer
metro.groupby(metro['date_time'].dt.year).mean()

输出：

                 temp   rain_1h   snow_1h  clouds_all  traffic_volume
date_time                                                            
2012       274.991782  0.000000  0.000000   65.295819     3207.802657
2013       278.976352  0.161284  0.000000   52.560947     3286.762160
2014       276.786438  0.243251  0.000000   49.070469     3250.938004
2015       287.689574  0.339218  0.001795   40.988338     3242.900983
2016       282.520790  1.192969  0.000308   48.628842     3169.441328
2017       281.463309  0.000000  0.000000   50.005281     3340.703065
2018       282.851502  0.121765  0.000000   45.567996     3260.112341

您可以尝试创建年份列：

metro['year'] =metro['date_time'].dt.year
metro.groupby['year'].mean()