Python 3.x 如何使用Pandas查找固定时间段之间的平均值和标准差

Python 3.x 如何使用Pandas查找固定时间段之间的平均值和标准差,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,我的数据集df如下所示: DateTimeVal Open 2017-01-01 17:00:00 5.1532 2017-01-01 17:01:00 5.3522 2017-01-01 17:02:00 5.4535 2017-01-01 17:03:00 5.3567 2017-01-01 17:04:00 5.1512 .... 它是一个基于minutediff的数据集 在我的计算中,一天(24小时)被定

我的数据集
df
如下所示:

DateTimeVal            Open 
2017-01-01 17:00:00    5.1532    
2017-01-01 17:01:00    5.3522 
2017-01-01 17:02:00    5.4535    
2017-01-01 17:03:00    5.3567    
2017-01-01 17:04:00    5.1512 
....
它是一个基于
minute
diff
的数据集

在我的计算中,一天(
24小时
)被定义为:

17:00:00
周日
16:59:00
周一
等其他日期

我想做的是找到从
17:00:00
周日到
16:59:00
周一的
24小时
AVG
STD
,等等

我做了什么?

我做了
滚动
来查找
平均值
,但它只在
进行,而不在
时间范围内

# day avg
# 7 day rolling avg

df = (
df.assign(DAY_AVG=df.rolling(window=1*24*60)['Open'].mean()) 
df.assign(7DAY_AVG=df.rolling(window=7*24*60)['Open'].mean())
.groupby(df['DateTimeVal'].dt.date) 
.last() ) 
我需要以下两方面的帮助:

  • 如何查找固定时间段之间的
    AVG
    STD
  • 我如何找到
    7D滚动和
    14天滚动的固定时间段之间的
    AVG
    STD

使用
重新采样

#Create empty dataframe for 2 days
df = pd.DataFrame(index = pd.date_range('2017-07-01', periods=48, freq='1H'))

#Set value equal to 1 from 17:00 to 16:59 next day
df.loc['2017-07-01 17:00:00': '2017-07-02 16:59:59', 'Value'] = 1

print(df)
输出:

                     Value
2017-07-01 00:00:00    NaN
2017-07-01 01:00:00    NaN
2017-07-01 02:00:00    NaN
2017-07-01 03:00:00    NaN
2017-07-01 04:00:00    NaN
2017-07-01 05:00:00    NaN
2017-07-01 06:00:00    NaN
2017-07-01 07:00:00    NaN
2017-07-01 08:00:00    NaN
2017-07-01 09:00:00    NaN
2017-07-01 10:00:00    NaN
2017-07-01 11:00:00    NaN
2017-07-01 12:00:00    NaN
2017-07-01 13:00:00    NaN
2017-07-01 14:00:00    NaN
2017-07-01 15:00:00    NaN
2017-07-01 16:00:00    NaN
2017-07-01 17:00:00    1.0
2017-07-01 18:00:00    1.0
2017-07-01 19:00:00    1.0
2017-07-01 20:00:00    1.0
2017-07-01 21:00:00    1.0
2017-07-01 22:00:00    1.0
2017-07-01 23:00:00    1.0
2017-07-02 00:00:00    1.0
2017-07-02 01:00:00    1.0
2017-07-02 02:00:00    1.0
2017-07-02 03:00:00    1.0
2017-07-02 04:00:00    1.0
2017-07-02 05:00:00    1.0
2017-07-02 06:00:00    1.0
2017-07-02 07:00:00    1.0
2017-07-02 08:00:00    1.0
2017-07-02 09:00:00    1.0
2017-07-02 10:00:00    1.0
2017-07-02 11:00:00    1.0
2017-07-02 12:00:00    1.0
2017-07-02 13:00:00    1.0
2017-07-02 14:00:00    1.0
2017-07-02 15:00:00    1.0
2017-07-02 16:00:00    1.0
2017-07-02 17:00:00    NaN
2017-07-02 18:00:00    NaN
2017-07-02 19:00:00    NaN
2017-07-02 20:00:00    NaN
2017-07-02 21:00:00    NaN
2017-07-02 22:00:00    NaN
2017-07-02 23:00:00    NaN
                     Value
2017-06-30 17:00:00    0.0
2017-07-01 17:00:00   24.0
2017-07-02 17:00:00    0.0
                    Value          
                      sum      mean
2018-09-30 17:00:00   120  0.117647
2018-10-01 17:00:00  1440  1.000000
2018-10-02 17:00:00   120  0.285036
现在使用,
resample
base=17

df.resample('24H', base=17).sum()
输出:

                     Value
2017-07-01 00:00:00    NaN
2017-07-01 01:00:00    NaN
2017-07-01 02:00:00    NaN
2017-07-01 03:00:00    NaN
2017-07-01 04:00:00    NaN
2017-07-01 05:00:00    NaN
2017-07-01 06:00:00    NaN
2017-07-01 07:00:00    NaN
2017-07-01 08:00:00    NaN
2017-07-01 09:00:00    NaN
2017-07-01 10:00:00    NaN
2017-07-01 11:00:00    NaN
2017-07-01 12:00:00    NaN
2017-07-01 13:00:00    NaN
2017-07-01 14:00:00    NaN
2017-07-01 15:00:00    NaN
2017-07-01 16:00:00    NaN
2017-07-01 17:00:00    1.0
2017-07-01 18:00:00    1.0
2017-07-01 19:00:00    1.0
2017-07-01 20:00:00    1.0
2017-07-01 21:00:00    1.0
2017-07-01 22:00:00    1.0
2017-07-01 23:00:00    1.0
2017-07-02 00:00:00    1.0
2017-07-02 01:00:00    1.0
2017-07-02 02:00:00    1.0
2017-07-02 03:00:00    1.0
2017-07-02 04:00:00    1.0
2017-07-02 05:00:00    1.0
2017-07-02 06:00:00    1.0
2017-07-02 07:00:00    1.0
2017-07-02 08:00:00    1.0
2017-07-02 09:00:00    1.0
2017-07-02 10:00:00    1.0
2017-07-02 11:00:00    1.0
2017-07-02 12:00:00    1.0
2017-07-02 13:00:00    1.0
2017-07-02 14:00:00    1.0
2017-07-02 15:00:00    1.0
2017-07-02 16:00:00    1.0
2017-07-02 17:00:00    NaN
2017-07-02 18:00:00    NaN
2017-07-02 19:00:00    NaN
2017-07-02 20:00:00    NaN
2017-07-02 21:00:00    NaN
2017-07-02 22:00:00    NaN
2017-07-02 23:00:00    NaN
                     Value
2017-06-30 17:00:00    0.0
2017-07-01 17:00:00   24.0
2017-07-02 17:00:00    0.0
                    Value          
                      sum      mean
2018-09-30 17:00:00   120  0.117647
2018-10-01 17:00:00  1440  1.000000
2018-10-02 17:00:00   120  0.285036

分钟采样更新:

df = pd.DataFrame({'Value': 0}, index = pd.date_range('2018-10-01', '2018-10-03', freq='1T'))

df.loc['2018-10-01 15:00:00':'2018-10-02 18:59:50', 'Value'] = 1

df.resample('24H', base=17).agg(['sum','mean'])
输出:

                     Value
2017-07-01 00:00:00    NaN
2017-07-01 01:00:00    NaN
2017-07-01 02:00:00    NaN
2017-07-01 03:00:00    NaN
2017-07-01 04:00:00    NaN
2017-07-01 05:00:00    NaN
2017-07-01 06:00:00    NaN
2017-07-01 07:00:00    NaN
2017-07-01 08:00:00    NaN
2017-07-01 09:00:00    NaN
2017-07-01 10:00:00    NaN
2017-07-01 11:00:00    NaN
2017-07-01 12:00:00    NaN
2017-07-01 13:00:00    NaN
2017-07-01 14:00:00    NaN
2017-07-01 15:00:00    NaN
2017-07-01 16:00:00    NaN
2017-07-01 17:00:00    1.0
2017-07-01 18:00:00    1.0
2017-07-01 19:00:00    1.0
2017-07-01 20:00:00    1.0
2017-07-01 21:00:00    1.0
2017-07-01 22:00:00    1.0
2017-07-01 23:00:00    1.0
2017-07-02 00:00:00    1.0
2017-07-02 01:00:00    1.0
2017-07-02 02:00:00    1.0
2017-07-02 03:00:00    1.0
2017-07-02 04:00:00    1.0
2017-07-02 05:00:00    1.0
2017-07-02 06:00:00    1.0
2017-07-02 07:00:00    1.0
2017-07-02 08:00:00    1.0
2017-07-02 09:00:00    1.0
2017-07-02 10:00:00    1.0
2017-07-02 11:00:00    1.0
2017-07-02 12:00:00    1.0
2017-07-02 13:00:00    1.0
2017-07-02 14:00:00    1.0
2017-07-02 15:00:00    1.0
2017-07-02 16:00:00    1.0
2017-07-02 17:00:00    NaN
2017-07-02 18:00:00    NaN
2017-07-02 19:00:00    NaN
2017-07-02 20:00:00    NaN
2017-07-02 21:00:00    NaN
2017-07-02 22:00:00    NaN
2017-07-02 23:00:00    NaN
                     Value
2017-06-30 17:00:00    0.0
2017-07-01 17:00:00   24.0
2017-07-02 17:00:00    0.0
                    Value          
                      sum      mean
2018-09-30 17:00:00   120  0.117647
2018-10-01 17:00:00  1440  1.000000
2018-10-02 17:00:00   120  0.285036

IIUC,您可以使用带有
base
参数的
resample
。感谢您撰写答案。一个澄清,我的数据集是
minute
based@floss这个答案对你有帮助吗?你介意接受吗?