Python 3.x 如何按小时获取数据

Python 3.x 如何按小时获取数据,python-3.x,pandas,Python 3.x,Pandas,我有一个csv文件,其内容如下 2018-02-28 09:48:18.884392+05:30,, 2018-03-04 10:50:34.833787+05:30,, 2018-03-05 13:04:23.634013+05:30,, 2018-03-14 05:30:14.51227+05:30,28.84,27.58 2018-03-14 05:45:14.51227+05:30,12.54,17.47 2018-03-14 06:30:14.466206+05:30,25.1,23.

我有一个csv文件,其内容如下

2018-02-28 09:48:18.884392+05:30,,
2018-03-04 10:50:34.833787+05:30,,
2018-03-05 13:04:23.634013+05:30,,
2018-03-14 05:30:14.51227+05:30,28.84,27.58
2018-03-14 05:45:14.51227+05:30,12.54,17.47
2018-03-14 06:30:14.466206+05:30,25.1,23.58
2018-03-14 06:40:14.466206+05:30,11.2,14.44
2018-03-14 07:18:14.493826+05:30,21.96,21.54
2018-03-14 08:30:14.593973+05:30,20.48,26.86
2018-03-14 09:30:14.481426+05:30,22.92,15.3
2018-03-14 10:31:20.307558+05:30,7.46,0
2018-03-14 11:30:14.556135+05:30,21,16.5
2018-03-14 12:30:14.569207+05:30,14.14,19.14
2018-03-14 13:11:14.470991+05:30,8.84,6.98
2018-03-14 14:20:14.500747+05:30,8.94,4.5
2018-03-14 15:30:14.487262+05:30,5.92,3.86
2018-03-14 16:30:14.454833+05:30,6.58,10.88
2018-03-14 17:30:14.482084+05:30,7.32,3.36
2018-03-14 18:27:14.559508+05:30,5.52,3.6
2018-03-14 19:30:14.611782+05:30,2.74,3.14
2018-03-14 20:30:14.461808+05:30,4.34,3.2
2018-03-14 21:30:14.533157+05:30,3.8,3.22
2018-03-14 22:15:14.451542+05:30,4.44,3.06
2018-03-14 23:30:14.5494+05:30,3.04,2.92
2018-03-15 00:30:14.477848+05:30,4.68,7.82
这里第一列是日期,第二列是上传速度,最后一列是下载速度

我需要以小时为单位显示两个特定日期之间所有小时的数据
2018-03-05
2018-03-14
,这样,任何数量的条目(上传和下载速度)在某个特定小时内,我都可以得到这些条目的平均值,并显示特定小时的平均值

下面是我的代码

import pandas as pd
import numpy as np


df = pd.read_csv("file.csv", header=None,
                 names=["date", "upload", "download"], parse_dates=["date"])
df.set_index("date", inplace=True)
df.fillna(0, inplace=True)
df.index = df.index.tz_localize('UTC').tz_convert('Asia/Kolkata')
# get data for the specified dates
df2 = df.loc['2018-03-05': '2018-03-14']
# add hourly frequency
print(df2.resample('1H').last())
下面是我得到的格式

                             upload  download
date                                       
2018-03-05 13:00:00+05:30    0.00      0.00
2018-03-05 14:00:00+05:30     NaN       NaN
2018-03-05 15:00:00+05:30     NaN       NaN
2018-03-05 16:00:00+05:30     NaN       NaN
2018-03-05 17:00:00+05:30     NaN       NaN
2018-03-05 18:00:00+05:30     NaN       NaN
2018-03-05 19:00:00+05:30     NaN       NaN
2018-03-05 20:00:00+05:30     NaN       NaN
2018-03-05 21:00:00+05:30     NaN       NaN
2018-03-05 22:00:00+05:30     NaN       NaN
2018-03-05 23:00:00+05:30     NaN       NaN
2018-03-06 00:00:00+05:30     NaN       NaN
2018-03-06 01:00:00+05:30     NaN       NaN
2018-03-06 02:00:00+05:30     NaN       NaN
2018-03-06 03:00:00+05:30     NaN       NaN
2018-03-06 04:00:00+05:30     NaN       NaN
2018-03-06 05:00:00+05:30     NaN       NaN
2018-03-06 06:00:00+05:30     NaN       NaN
2018-03-06 07:00:00+05:30     NaN       NaN
2018-03-06 08:00:00+05:30     NaN       NaN
2018-03-06 09:00:00+05:30     NaN       NaN
2018-03-06 10:00:00+05:30     NaN       NaN
2018-03-06 11:00:00+05:30     NaN       NaN
2018-03-06 12:00:00+05:30     NaN       NaN
2018-03-06 13:00:00+05:30     NaN       NaN
2018-03-06 14:00:00+05:30     NaN       NaN
2018-03-06 15:00:00+05:30     NaN       NaN
2018-03-06 16:00:00+05:30     NaN       NaN
2018-03-06 17:00:00+05:30     NaN       NaN
2018-03-06 18:00:00+05:30     NaN       NaN
...                           ...       ...
2018-03-13 18:00:00+05:30     NaN       NaN
2018-03-13 19:00:00+05:30     NaN       NaN
2018-03-13 20:00:00+05:30     NaN       NaN
2018-03-13 21:00:00+05:30     NaN       NaN
2018-03-13 22:00:00+05:30     NaN       NaN
2018-03-13 23:00:00+05:30     NaN       NaN
2018-03-14 00:00:00+05:30     NaN       NaN
2018-03-14 01:00:00+05:30     NaN       NaN
2018-03-14 02:00:00+05:30     NaN       NaN
2018-03-14 03:00:00+05:30     NaN       NaN
2018-03-14 04:00:00+05:30     NaN       NaN
2018-03-14 05:00:00+05:30   12.54     17.47
2018-03-14 06:00:00+05:30   11.20     14.44
2018-03-14 07:00:00+05:30   21.96     21.54
2018-03-14 08:00:00+05:30   20.48     26.86
2018-03-14 09:00:00+05:30   22.92     15.30
2018-03-14 10:00:00+05:30    7.46      0.00
2018-03-14 11:00:00+05:30   21.00     16.50
2018-03-14 12:00:00+05:30   14.14     19.14
2018-03-14 13:00:00+05:30    8.84      6.98
2018-03-14 14:00:00+05:30    8.94      4.50
2018-03-14 15:00:00+05:30    5.92      3.86
2018-03-14 16:00:00+05:30    6.58     10.88
2018-03-14 17:00:00+05:30    7.32      3.36
2018-03-14 18:00:00+05:30    5.52      3.60
2018-03-14 19:00:00+05:30    2.74      3.14
2018-03-14 20:00:00+05:30    4.34      3.20
2018-03-14 21:00:00+05:30    3.80      3.22
2018-03-14 22:00:00+05:30    4.44      3.06
2018-03-14 23:00:00+05:30    3.04      2.92
我确实是按小时计算数据的,但这似乎是错误的。如果你仔细观察,对于日期
2018-03-14
,原始数据显示在
5:30
,我的读数是 28.8427.58分别在
5:45
时,我的读数分别为12.5417.47,读数为12.5417.47。它似乎选择了特定时间的最新条目。其他时间段也是如此


如何显示两个指定日期之间所有小时的小时数据,其中包含为特定小时创建的条目的平均值,如果未创建条目,则显示0?

IIUC您使用的是提供最后一个值的
last()
,而不是使用
mean()


IIUC:
df.resample('H').mean().fillna(0)
你说:它似乎在为特定的小时选择最新的条目。是的,这正是df2.resample('1H').last()将要做的。嘿,非常感谢!还有一个问题,如果我需要找出多个统计数据,比如标准偏差,最小值,最大值,我是用这个函数为每个数据执行一行代码,还是有更好的方法?你可以使用
df.resample('H').agg(['mean','std'])。fillna(0)
。哦,这太好了!谢谢
df.resample('H').mean().fillna(0)