Python 使用Panda每月平均一分钟时间序列数据集
我有一个非常大的分钟时间序列数据集(3个月),其格式如下Python 使用Panda每月平均一分钟时间序列数据集,python,pandas,dataframe,time-series,Python,Pandas,Dataframe,Time Series,我有一个非常大的分钟时间序列数据集(3个月),其格式如下 datetime,val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12 1/06/2017 0:00,0,0,0,0,0,0,0,0,0,0.011,0,0.036 1/06/2017 0:01,0,0,0,0,0,0,0,0,0,0.011,0,0.036 ... 1/06/2017 23:59,0,0,0,0,0,0,0,0,0,0.011,0,0.035 2/0
datetime,val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12
1/06/2017 0:00,0,0,0,0,0,0,0,0,0,0.011,0,0.036
1/06/2017 0:01,0,0,0,0,0,0,0,0,0,0.011,0,0.036
...
1/06/2017 23:59,0,0,0,0,0,0,0,0,0,0.011,0,0.035
2/06/2017 0:00,0,0,0,0,0,0,0,0,0,0.014,0,0.036
2/06/2017 0:01,0,0,0,0,0,0,0,0,0,0.011,0,0.036
...
2/06/2017 23:59,0,0,0,0,0,0,0,0,0,0.011,0,0.035
....
31/08/2017 0:00,0,0.2,0,0,0,0.56,0,0,0,0.014,0,0.036
31/08/2017 0:01,0,0.23,0,0,0,0,0,0,0,0.011,0,0.032
...
31/08/2017 23:59,0,0,0,0,0,0,.55,0,0,0.011,0,0.034
使用panda获得每个列每月平均值的最有效方法是什么?
预期产出为
month,val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12
06/2017,0,0,0,0,0,0,0,0,0,0.011,0,0.036
07/2017,0,0,0,0,0,0,0,0,0,0.014,0,0.036
08/2017,0,0,0.21,0,0,0,0,0.52,0,0.011,0,0.036
目前,我正在做的是逐日读取数据集,然后得到一个累计天数的数据集,然后除以每月天数。但这非常低效,而且需要花费大量时间。首先将列转换为,然后在月份开始时将列转换为MS
,最后将DatetimeIndex的格式更改为MM/yyyy
:
或者将转换后的日期时间列按传递给groupby
并聚合mean
:
df = df.groupby(df['datetime'].dt.strftime('%m/%Y')).mean()
print (df)
val1 val2 val3 val4 val5 val6 val7 val8 val9 \
datetime
06/2017 0 0.000000 0 0 0 0.000000 0.000000 0 0
08/2017 0 0.143333 0 0 0 0.186667 0.183333 0 0
val10 val11 val12
datetime
06/2017 0.0115 0 0.035667
08/2017 0.0120 0 0.034000
熊猫
read_csv
和to_csv
是您需要的:
df = pd.read_csv('input.csv', parse_dates=['datetime'])
df.groupby(df.datetime.dt.strftime('%m/%Y')).mean().rename_axis('month').to_csv(out, float_format='%.06f')
通过您的输入数据(从…)它给出:
month,val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12
01/2017,0,0.000000,0,0,0,0.000000,0.000000,0,0,0.011000,0,0.035667
02/2017,0,0.000000,0,0,0,0.000000,0.000000,0,0,0.012000,0,0.035667
08/2017,0,0.143333,0,0,0,0.186667,0.183333,0,0,0.012000,0,0.034000
df = pd.read_csv('input.csv', parse_dates=['datetime'])
df.groupby(df.datetime.dt.strftime('%m/%Y')).mean().rename_axis('month').to_csv(out, float_format='%.06f')