Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 每天应用,不包括几个小时_Python_Pandas_Datetime_Aggregate_Apply - Fatal编程技术网

Python 每天应用,不包括几个小时

Python 每天应用,不包括几个小时,python,pandas,datetime,aggregate,apply,Python,Pandas,Datetime,Aggregate,Apply,我有这个数据框: dates,rr.price,ax.price,be.price 2018-01-01 00:00:00,45.73,45.83,47.63 2018-01-01 01:00:00,44.16,44.59,44.42 2018-01-01 02:00:00,42.24,40.22,42.34 2018-01-01 03:00:00,39.29,37.31,38.36 2018-01-01 04:00:00,36.0,32.88,36.87 2018-01-01 05:00:00

我有这个数据框:

dates,rr.price,ax.price,be.price
2018-01-01 00:00:00,45.73,45.83,47.63
2018-01-01 01:00:00,44.16,44.59,44.42
2018-01-01 02:00:00,42.24,40.22,42.34
2018-01-01 03:00:00,39.29,37.31,38.36
2018-01-01 04:00:00,36.0,32.88,36.87
2018-01-01 05:00:00,41.99,39.27,39.79
2018-01-01 06:00:00,42.25,43.62,42.08
2018-01-01 07:00:00,44.97,49.69,51.19
2018-01-01 08:00:00,45.0,49.98,59.69
2018-01-01 09:00:00,44.94,48.04,56.67
2018-01-01 10:00:00,45.04,46.85,53.54
2018-01-01 11:00:00,46.67,47.95,52.6
2018-01-01 12:00:00,46.99,46.6,50.77
2018-01-01 13:00:00,44.16,43.02,50.27
2018-01-01 14:00:00,45.26,44.2,50.64
2018-01-01 15:00:00,47.84,47.1,54.79
2018-01-01 16:00:00,50.1,50.83,60.17
2018-01-01 17:00:00,54.3,58.31,59.47
2018-01-01 18:00:00,51.91,63.5,60.16
2018-01-01 19:00:00,51.38,61.9,70.81
2018-01-01 20:00:00,49.2,59.62,62.65
2018-01-01 21:00:00,45.73,52.84,59.71
2018-01-01 22:00:00,44.84,51.43,50.96
2018-01-01 23:00:00,38.11,45.35,46.52
2018-01-02 00:00:00,19.19,41.61,49.62
2018-01-02 01:00:00,14.99,40.78,45.05
2018-01-02 02:00:00,11.0,39.59,45.18
2018-01-02 03:00:00,10.0,36.95,37.12
2018-01-02 04:00:00,11.83,31.38,38.03
2018-01-02 05:00:00,14.99,34.02,46.17
2018-01-02 06:00:00,40.6,41.27,51.71
2018-01-02 07:00:00,46.99,48.25,54.37
2018-01-02 08:00:00,47.95,43.57,75.3
2018-01-02 09:00:00,49.9,48.34,68.48
2018-01-02 10:00:00,50.0,48.01,61.94
2018-01-02 11:00:00,49.7,52.22,63.26
2018-01-02 12:00:00,48.16,47.47,59.41
2018-01-02 13:00:00,47.24,47.61,60.0
2018-01-02 14:00:00,46.1,49.12,67.44
2018-01-02 15:00:00,47.6,52.38,66.82
2018-01-02 16:00:00,50.45,58.35,72.17
2018-01-02 17:00:00,54.9,61.4,70.28
2018-01-02 18:00:00,57.18,54.58,62.63
2018-01-02 19:00:00,54.9,53.66,63.78
2018-01-02 20:00:00,51.2,54.15,63.08
2018-01-02 21:00:00,48.82,48.67,56.42
2018-01-02 22:00:00,45.14,47.46,49.85
2018-01-02 23:00:00,40.09,42.46,43.87
2018-01-03 00:00:00,42.75,34.72,25.51
2018-01-03 01:00:00,35.02,30.31,21.07
我想使用带有用户定义函数的“.apply”和“.groupby”,同时我想排除“.apply调用”中的一些小时

这就是我到目前为止所做的:

作为pd进口熊猫 将numpy作为np导入

def rmse(group,s1,s2):
   if len(group) == 0:
       return np.nan
   s = (group[s1] - group[s2]).pow(2).sum()
   rmseO = np.sqrt(s / len(group)) 
   return rmseO 


dfr=pd.read_csv('./test.csv',header = 0, index_col=0, parse_dates=True,
              usecols=['dates','rr.price','ax.price','be.price'])

dfr.to_csv('./test_shot.csv',index=True)

dfr = dfr.assign(date=lambda x: x.index.date).groupby('date')

dfrM = pd.DataFrame()
dfrM['ax.rmse'] = dfr.apply(rmse,   s1='rr.price',s2='ax.price')
正如您所注意到的,我使用“.groupby('date')”来为每个日期创建一个组。 之后,我将函数“rmse”应用于

关键是,在“rmse”中,每天24小时都被考虑在内

例如,我想排除每天的1、2、3和20、21、22、24小时。基本上每天24-7天使用rmse数据

我希望我已经讲清楚了


提前感谢

您可以这样创建此功能:

dfr = dfr.reset_index()
dfr['dates'] = pd.to_datetime(dfr['dates'])

def rmse(df, group, s1, s2):
    df = df[~df['dates'].dt.hour.isin(group)]
    g = df.groupby(df['dates'].dt.date)
    return (np.sqrt((g[s1].sum() - g[s2].sum()).pow(2).div(g.size()))
            .rename('ax.rmse').reset_index())


rmse(dfr, [1,2,3,20,21,22,24], 'ax.price', 'rr.price')

         dates    ax.rmse
0   2018-01-01   9.965492
1   2018-01-02  18.368277
2   2018-01-03   8.030000
根据注释,在运行上述代码后,您可以在末尾执行以下操作:

rmse(dfr, [1,2,3,20,21,22,24], 'ax.price', 'rr.price').set_index('dates')

我有“type(dfr.index)-pandas.core.index.datetimes.DatetimeIndex”。我可以用DatetimeIndex应用你的建议吗?谢谢。@diedro我在开头添加了这行代码:
dfr=dfr.reset\u index()
现在再试一次。最后,如果您想将其返回到索引中,还可以执行
dfr=dfr.set_index('dates')
。我使用了:dfr=dfr[~dfr.index.hour.isin(exclude)]。你怎么认为?如果可以的话,“~”是什么意思?@diedro
~
是相反的。。。所以除了
[1,2,3,20,21,22,24]
之外的所有时间。如果oyu在索引上所做的对您的用户有用的话,那也没问题。当我读入您的数据框时,我没有读入索引上的
date
rmse(dfr, [1,2,3,20,21,22,24], 'ax.price', 'rr.price').set_index('dates')