Python 重采样数据集
我正在研究Metro州际交通量数据集(可在此处找到:),但我无法对数据集进行重采样以显示每天的平均交通量,而不是每小时的平均交通量Python 重采样数据集,python,pandas,Python,Pandas,我正在研究Metro州际交通量数据集(可在此处找到:),但我无法对数据集进行重采样以显示每天的平均交通量,而不是每小时的平均交通量 metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv') metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S') metro.set_index('date_time', inplace=True
metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
metro.set_index('date_time', inplace=True, drop=True)
metro.resample('1Y').mean()
这就是我得到的:
holiday temp ... weather_description traffic_volume
date_time ...
2012-10-02 09:00:00 None 288.28 ... scattered clouds 5545
2012-10-02 10:00:00 None 289.36 ... broken clouds 4516
2012-10-02 11:00:00 None 289.58 ... overcast clouds 4767
2012-10-02 12:00:00 None 290.13 ... overcast clouds 5026
2012-10-02 13:00:00 None 291.14 ... broken clouds 4918
... ... ... ... ... ...
2018-09-30 19:00:00 None 283.45 ... broken clouds 3543
2018-09-30 20:00:00 None 282.76 ... overcast clouds 2781
2018-09-30 21:00:00 None 282.73 ... proximity thunderstorm 2159
2018-09-30 22:00:00 None 282.09 ... overcast clouds 1450
2018-09-30 23:00:00 None 282.12 ... overcast clouds 954
[48204 rows x 8 columns]
你知道怎么解决吗
编辑: 此外,我还检查了pandas参考以进行重采样(),并执行了以下示例代码:
d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2018',
periods=8,
freq='W')
df
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
df.resample('M', on='week_starting').mean()
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0
但是,对我来说,重新采样前后的结果是相同的。一年不是固定的时间段:有些年份有365天,有些年份有366天。您可以使用
groupby
:
metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
# extract the yer
metro.groupby(metro['date_time'].dt.year).mean()
输出:
temp rain_1h snow_1h clouds_all traffic_volume
date_time
2012 274.991782 0.000000 0.000000 65.295819 3207.802657
2013 278.976352 0.161284 0.000000 52.560947 3286.762160
2014 276.786438 0.243251 0.000000 49.070469 3250.938004
2015 287.689574 0.339218 0.001795 40.988338 3242.900983
2016 282.520790 1.192969 0.000308 48.628842 3169.441328
2017 281.463309 0.000000 0.000000 50.005281 3340.703065
2018 282.851502 0.121765 0.000000 45.567996 3260.112341
您可以尝试创建年份列:
metro['year'] =metro['date_time'].dt.year
metro.groupby['year'].mean()