Python 具有平均时间的数据透视表
我一直在与pandas合作,对时间序列数据进行分析,并一直致力于将它们集成到数据透视表中。我在csv中有一个数据,如下所示:Python 具有平均时间的数据透视表,python,pandas,dataframe,time-series,Python,Pandas,Dataframe,Time Series,我一直在与pandas合作,对时间序列数据进行分析,并一直致力于将它们集成到数据透视表中。我在csv中有一个数据,如下所示: gov start end a 2015-12-08T16:05:00.980+03 2015-12-08T16:14:31.765+03 a 2015-12-08T16:07:53.356+03 2015-12-08T16:34:43.413+03 b 2015-12-08T16:08:43.371+03 2015-12-08T16:54:32.2
gov start end
a 2015-12-08T16:05:00.980+03 2015-12-08T16:14:31.765+03
a 2015-12-08T16:07:53.356+03 2015-12-08T16:34:43.413+03
b 2015-12-08T16:08:43.371+03 2015-12-08T16:54:32.257+03
b 2015-12-08T15:56:12.006+03 2015-12-08T17:35:04.499+03
我有一组简单的数据,有开始时间和结束时间,从中计算出两者之间的时间差:
piv_t_subset = pd.read_csv('time_test.csv', parse_dates=['start','end'])
piv_t_subset['time_diff'] = piv_t_subset['end'] - piv_t_subset['start']
我可以将时间的独立平均值计算为:
t = piv_t_subset['time_diff'].mean()
print t
0 days 00:18:53.703286
我想使用此时间信息创建透视表,但尝试时:
pd.pivot_table(piv_t_subset,index=["gov"],values=['time_diff'],aggfunc=[np.mean])
我得到一个错误:
DataError:没有要聚合的数字类型
我需要做更多的预处理才能将其从timeseries转换为float吗?现在不支持
但您可以通过以下方式将timedelta64系列转换为浮点系列:
完美的遗憾的是,它没有得到直接的支持,但它工作得很好!
piv_t_subset['time_diff1'] = [td.total_seconds() for td in piv_t_subset['time_diff']]
print piv_t_subset
gov start end
0 a 2015-12-08 13:05:00.980 2015-12-08 13:14:31.765
1 a 2015-12-08 13:07:53.356 2015-12-08 13:34:43.413
2 b 2015-12-08 13:08:43.371 2015-12-08 13:54:32.257
3 b 2015-12-08 12:56:12.006 2015-12-08 14:35:04.499
piv_t_subset['time_diff'] = piv_t_subset['end'] - piv_t_subset['start']
piv_t_subset['time_diff1'] = [td.total_seconds() for td in piv_t_subset['time_diff']]
print piv_t_subset
gov start end time_diff \
0 a 2015-12-08 13:05:00.980 2015-12-08 13:14:31.765 00:09:30.785000
1 a 2015-12-08 13:07:53.356 2015-12-08 13:34:43.413 00:26:50.057000
2 b 2015-12-08 13:08:43.371 2015-12-08 13:54:32.257 00:45:48.886000
3 b 2015-12-08 12:56:12.006 2015-12-08 14:35:04.499 01:38:52.493000
time_diff1
0 570.785
1 1610.057
2 2748.886
3 5932.493
print piv_t_subset.groupby('gov').agg({'time_diff1':np.mean})
time_diff1
gov
a 1090.4210
b 4340.6895
#omit aggfunc, in pivot_table is default numpy.mean
print pd.pivot_table(piv_t_subset,index=["gov"],values=['time_diff1'])
time_diff1
gov
a 1090.4210
b 4340.6895