Python 3.x 如何将时间序列数据捆绑到熊猫的24小时间隔中?
我有一个csv文件,其中包含一些数据Python 3.x 如何将时间序列数据捆绑到熊猫的24小时间隔中?,python-3.x,pandas,Python 3.x,Pandas,我有一个csv文件,其中包含一些数据 ,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id 389,2018-02-13 09:20:17.572685+00:00,ASA,10.11.100.1,BOM,4.0,23.0,0.25,0.0,,,,,,,, 390,2018-02-13 09:20:21.83628
,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
389,2018-02-13 09:20:17.572685+00:00,ASA,10.11.100.1,BOM,4.0,23.0,0.25,0.0,,,,,,,,
390,2018-02-13 09:20:21.836284+00:00,ASA,10.11.100.1,COD,4.0,23.0,2.08,0.0,,,,,,,,
391,2018-02-13 09:30:59.401178+00:00,ASA,10.11.100.1,COD,5.0,23.0,8.0,0.0,,,,,,,,
392,2018-02-13 09:31:03.667730+00:00,ASA,10.11.100.1,COD,5.0,23.0,3.5,0.0,,,,,,,,
393,2018-02-13 09:41:14.666626+00:00,ASA,10.11.100.1,BOM,4.0,23.0,0.5,0.0,,,,,,,,
394,2018-02-13 09:41:18.935061+00:00,ASA,10.11.100.1,DAE,4.0,23.0,3.0,0.0,,,,,,,,
395,2018-02-13 09:50:17.491014+00:00,ASA,10.11.100.1,DAE,5.0,23.0,8.25,0.0,,,,,,,,
396,2018-02-13 09:50:21.751805+00:00,BBB,10.11.100.1,BOM,5.0,23.0,2.75,0.0,,,,,,,,
397,2018-02-13 10:00:18.387647+00:00,BBB,10.11.100.1,CXU,5.0,23.0,2.0,0.0,,,,,,,,
398,2018-02-13 10:00:22.847626+00:00,ASA,10.11.100.1,BOM,5.0,23.0,3.17,0.0,,,,,,,,
399,2018-02-13 10:10:17.521642+00:00,BBB,10.11.100.1,DAE,5.0,23.0,1.0,0.0,,,,,,,,
400,2018-02-13 10:10:21.786720+00:00,BBB,10.11.100.1,DAE,5.0,23.0,2.42,0.0,,,,,,,,
401,2018-02-13 10:14:38.085999+00:00,BBB,10.11.100.1,CXU,4.0,23.0,0.25,0.0,,,,,,,,
..................................................................................
..................................................................................
正如您所看到的,日期2018-02-13
在几个时间间隔内有很多条目。我想将这些条目放入24
小时间隔中,每个小时将包含一个值(平均值)。这就是我所做的
df_next = df.loc['2018-04-13'].resample('H')["cpu"].mean().fillna(0)
但是,对于日期2018-04-13
,收集的数据最多只有10:00
小时(最后一次输入是在10:14:38
),所以它只给出了最多的时间。对于其他一些日期,如果数据是从9:00
开始收集的,那么我只得到从9:00
开始的每小时间隔
无论在什么时候收集数据,我如何从日期的00:00
开始获得完整的24
小时间隔?因此,基本上,它将为未收集数据的小时分配0
,为收集数据的小时分配mean
所以基本上我想要这样的东西
381,2018-02-13 00:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
382,2018-02-13 01:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
383,2018-02-13 02:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
384,2018-02-13 03:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
385,2018-02-13 04:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
386,2018-02-13 05:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
387,2018-02-13 06:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
388,2018-02-13 07:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
388,2018-02-13 08:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
389,2018-02-13 09:20:17.572685+00:00,ASA,10.11.100.1,BOM,4.0,23.0,0.25,0.0,,,,,,,,
390,2018-02-13 09:20:21.836284+00:00,ASA,10.11.100.1,COD,4.0,23.0,2.08,0.0,,,,,,,,
391,2018-02-13 09:30:59.401178+00:00,ASA,10.11.100.1,COD,5.0,23.0,8.0,0.0,,,,,,,,
392,2018-02-13 09:31:03.667730+00:00,ASA,10.11.100.1,COD,5.0,23.0,3.5,0.0,,,,,,,,
393,2018-02-13 09:41:14.666626+00:00,ASA,10.11.100.1,BOM,4.0,23.0,0.5,0.0,,,,,,,,
394,2018-02-13 09:41:18.935061+00:00,ASA,10.11.100.1,DAE,4.0,23.0,3.0,0.0,,,,,,,,
395,2018-02-13 09:50:17.491014+00:00,ASA,10.11.100.1,DAE,5.0,23.0,8.25,0.0,,,,,,,,
396,2018-02-13 09:50:21.751805+00:00,BBB,10.11.100.1,BOM,5.0,23.0,2.75,0.0,,,,,,,,
397,2018-02-13 10:00:18.387647+00:00,BBB,10.11.100.1,CXU,5.0,23.0,2.0,0.0,,,,,,,,
398,2018-02-13 10:00:22.847626+00:00,ASA,10.11.100.1,BOM,5.0,23.0,3.17,0.0,,,,,,,,
399,2018-02-13 10:10:17.521642+00:00,BBB,10.11.100.1,DAE,5.0,23.0,1.0,0.0,,,,,,,,
400,2018-02-13 10:10:21.786720+00:00,BBB,10.11.100.1,DAE,5.0,23.0,2.42,0.0,,,,,,,,
401,2018-02-13 10:14:38.085999+00:00,BBB,10.11.100.1,CXU,4.0,23.0,0.25,0.0,,,,,,,,
402,2018-02-13 11:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
403,2018-02-13 12:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
404,2018-02-13 13:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
405,2018-02-13 14:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
406,2018-02-13 15:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
407,2018-02-13 16:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
408,2018-02-13 17:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
409,2018-02-13 18:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
410,2018-02-13 19:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
411,2018-02-13 20:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
412,2018-02-13 21:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
413,2018-02-13 22:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
414,2018-02-13 23:00:00.000000+00:00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
如您所见,它会填满剩余的时间,并将其值保持为0
使用:
#get Series only for hourly data
#remove non exist hours by dropna
a = df.resample('H')["cpu"].mean().dropna()
#create all posible hours by first min and max value floor to 0 and 23 hour
rng = pd.date_range(a.index.min().floor('d'),
a.index.max().floor('d') + pd.Timedelta(23, unit='h'), freq='H')
#get all missing index values - missing hours
diff_idx = rng.difference(a.index)
#join new DataFrame with missing values to original, last sorting for correct ordering
df = pd.concat([df, pd.DataFrame(index=diff_idx, columns=df.columns)]).sort_index()
嘿,我想要一个特定日期的
24
小时间隔。在这种情况下,2018-04-13
。而对于另一个数据,我不需要?基本上,我会得到一个日期,然后我必须给出该日期24小时间隔的cpu
值。因此,特定日期将有24个值。@SouvikRay-可以删除NaN
s行-不存在的间隔?使用s.fillna(0,inplace=True)
而不是s.dripna()