Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 时间序列数据的分组和重采样_Python_Pandas - Fatal编程技术网

Python 时间序列数据的分组和重采样

Python 时间序列数据的分组和重采样,python,pandas,Python,Pandas,数据: ohlc_dict = { 'Open':'first', 'High':'max', 'Low':'min', 'Last': 'last', 'Volume': 'sum'} data['hod'] = [r.hour for r in data.index] data.head(10) Out[61]: Open High Low Last Volume hod dow Timestamp

数据:

ohlc_dict = {
'Open':'first',
'High':'max',
'Low':'min',
'Last': 'last',
'Volume': 'sum'}

data['hod'] = [r.hour for r in data.index]

data.head(10)
Out[61]:

                    Open    High    Low    Last    Volume   hod dow
Timestamp                           
2014-05-08 08:00:00 136.230 136.290 136.190 136.290 7077    8   Thursday
2014-05-08 08:15:00 136.290 136.300 136.240 136.250 3881    8   Thursday
2014-05-08 08:30:00 136.240 136.270 136.230 136.230 2540    8   Thursday
2014-05-08 08:45:00 136.230 136.260 136.230 136.250 2293    8   Thursday
2014-05-08 09:00:00 136.250 136.360 136.240 136.360 15014   9   Thursday
2014-05-08 09:15:00 136.350 136.360 136.260 136.270 11697   9   Thursday
2014-05-08 09:30:00 136.270 136.270 136.190 136.200 15600   9   Thursday
2014-05-08 09:45:00 136.200 136.270 136.200 136.240 9025    9   Thursday
2014-05-08 10:00:00 136.240 136.270 136.240 136.260 7128    10  Thursday
2014-05-08 10:15:00 136.250 136.260 136.200 136.200 6100    10  Thursday
data['2016'].groupby('hod').Volume.mean().head()

hod
8     8452.597
9    16485.398
10   15619.626
11   14132.666
12   11470.058
Name: Volume, dtype: float64
df_h1 = data.resample('1h').agg(ohlc_dict).dropna()
df_h1['hod'] = [r.hour for r in df_h1.index]
df_h1['2016'].groupby('hod')['Volume'].mean()

Timestamp
2014-05-08 08:00:00   15791.000
2014-05-08 09:00:00   51336.000
2014-05-08 10:00:00   28855.000
2014-05-08 11:00:00   56543.000
2014-05-08 12:00:00   25249.000
Name: Volume, dtype: float64
问题:

ohlc_dict = {
'Open':'first',
'High':'max',
'Low':'min',
'Last': 'last',
'Volume': 'sum'}

data['hod'] = [r.hour for r in data.index]

data.head(10)
Out[61]:

                    Open    High    Low    Last    Volume   hod dow
Timestamp                           
2014-05-08 08:00:00 136.230 136.290 136.190 136.290 7077    8   Thursday
2014-05-08 08:15:00 136.290 136.300 136.240 136.250 3881    8   Thursday
2014-05-08 08:30:00 136.240 136.270 136.230 136.230 2540    8   Thursday
2014-05-08 08:45:00 136.230 136.260 136.230 136.250 2293    8   Thursday
2014-05-08 09:00:00 136.250 136.360 136.240 136.360 15014   9   Thursday
2014-05-08 09:15:00 136.350 136.360 136.260 136.270 11697   9   Thursday
2014-05-08 09:30:00 136.270 136.270 136.190 136.200 15600   9   Thursday
2014-05-08 09:45:00 136.200 136.270 136.200 136.240 9025    9   Thursday
2014-05-08 10:00:00 136.240 136.270 136.240 136.260 7128    10  Thursday
2014-05-08 10:15:00 136.250 136.260 136.200 136.200 6100    10  Thursday
data['2016'].groupby('hod').Volume.mean().head()

hod
8     8452.597
9    16485.398
10   15619.626
11   14132.666
12   11470.058
Name: Volume, dtype: float64
df_h1 = data.resample('1h').agg(ohlc_dict).dropna()
df_h1['hod'] = [r.hour for r in df_h1.index]
df_h1['2016'].groupby('hod')['Volume'].mean()

Timestamp
2014-05-08 08:00:00   15791.000
2014-05-08 09:00:00   51336.000
2014-05-08 10:00:00   28855.000
2014-05-08 11:00:00   56543.000
2014-05-08 12:00:00   25249.000
Name: Volume, dtype: float64
以下两项都将时间范围从15分钟更改为1小时间隔:

方法1:

ohlc_dict = {
'Open':'first',
'High':'max',
'Low':'min',
'Last': 'last',
'Volume': 'sum'}

data['hod'] = [r.hour for r in data.index]

data.head(10)
Out[61]:

                    Open    High    Low    Last    Volume   hod dow
Timestamp                           
2014-05-08 08:00:00 136.230 136.290 136.190 136.290 7077    8   Thursday
2014-05-08 08:15:00 136.290 136.300 136.240 136.250 3881    8   Thursday
2014-05-08 08:30:00 136.240 136.270 136.230 136.230 2540    8   Thursday
2014-05-08 08:45:00 136.230 136.260 136.230 136.250 2293    8   Thursday
2014-05-08 09:00:00 136.250 136.360 136.240 136.360 15014   9   Thursday
2014-05-08 09:15:00 136.350 136.360 136.260 136.270 11697   9   Thursday
2014-05-08 09:30:00 136.270 136.270 136.190 136.200 15600   9   Thursday
2014-05-08 09:45:00 136.200 136.270 136.200 136.240 9025    9   Thursday
2014-05-08 10:00:00 136.240 136.270 136.240 136.260 7128    10  Thursday
2014-05-08 10:15:00 136.250 136.260 136.200 136.200 6100    10  Thursday
data['2016'].groupby('hod').Volume.mean().head()

hod
8     8452.597
9    16485.398
10   15619.626
11   14132.666
12   11470.058
Name: Volume, dtype: float64
df_h1 = data.resample('1h').agg(ohlc_dict).dropna()
df_h1['hod'] = [r.hour for r in df_h1.index]
df_h1['2016'].groupby('hod')['Volume'].mean()

Timestamp
2014-05-08 08:00:00   15791.000
2014-05-08 09:00:00   51336.000
2014-05-08 10:00:00   28855.000
2014-05-08 11:00:00   56543.000
2014-05-08 12:00:00   25249.000
Name: Volume, dtype: float64
方法2:

ohlc_dict = {
'Open':'first',
'High':'max',
'Low':'min',
'Last': 'last',
'Volume': 'sum'}

data['hod'] = [r.hour for r in data.index]

data.head(10)
Out[61]:

                    Open    High    Low    Last    Volume   hod dow
Timestamp                           
2014-05-08 08:00:00 136.230 136.290 136.190 136.290 7077    8   Thursday
2014-05-08 08:15:00 136.290 136.300 136.240 136.250 3881    8   Thursday
2014-05-08 08:30:00 136.240 136.270 136.230 136.230 2540    8   Thursday
2014-05-08 08:45:00 136.230 136.260 136.230 136.250 2293    8   Thursday
2014-05-08 09:00:00 136.250 136.360 136.240 136.360 15014   9   Thursday
2014-05-08 09:15:00 136.350 136.360 136.260 136.270 11697   9   Thursday
2014-05-08 09:30:00 136.270 136.270 136.190 136.200 15600   9   Thursday
2014-05-08 09:45:00 136.200 136.270 136.200 136.240 9025    9   Thursday
2014-05-08 10:00:00 136.240 136.270 136.240 136.260 7128    10  Thursday
2014-05-08 10:15:00 136.250 136.260 136.200 136.200 6100    10  Thursday
data['2016'].groupby('hod').Volume.mean().head()

hod
8     8452.597
9    16485.398
10   15619.626
11   14132.666
12   11470.058
Name: Volume, dtype: float64
df_h1 = data.resample('1h').agg(ohlc_dict).dropna()
df_h1['hod'] = [r.hour for r in df_h1.index]
df_h1['2016'].groupby('hod')['Volume'].mean()

Timestamp
2014-05-08 08:00:00   15791.000
2014-05-08 09:00:00   51336.000
2014-05-08 10:00:00   28855.000
2014-05-08 11:00:00   56543.000
2014-05-08 12:00:00   25249.000
Name: Volume, dtype: float64
只有方法2给出了体积数据的精确输出


我如何更改方法1,以获得与方法2相同的
体积
输出,但使用
groupby
而不是
重采样
?我不知道如何在方法1中使用
ohlc\u dict
,我觉得这是必需的。

在方法1中,你对该小时类型的所有观察值取直接平均值。方法2:首先对每小时的总量求和,然后对所有小时的总量求平均值。我希望方法2给出的结果是方法1的倍数,其中倍数是每小时观察的频率。您好@piRSquared感谢您的输入,我怀疑这一点,这就是为什么我问如何在方法1中使用
ohlc_dict
。有可能吗?@piRsquared(或其他感兴趣的人)。我已经对照一个商业图表包交叉引用了这些数据,方法2似乎给出了正确的输出。查看groupby(方法1)语法需要如何修改以获得与方法2相同的结果将非常有用。在方法1中,您将对该小时类型的所有观察值进行直接平均。方法2:首先对每小时的总量求和,然后对所有小时的总量求平均值。我希望方法2给出的结果是方法1的倍数,其中倍数是每小时观察的频率。您好@piRSquared感谢您的输入,我怀疑这一点,这就是为什么我问如何在方法1中使用
ohlc_dict
。有可能吗?@piRsquared(或其他感兴趣的人)。我已经对照一个商业图表包交叉引用了这些数据,方法2似乎给出了正确的输出。了解如何修改groupby(方法1)语法以获得与方法2相同的结果将非常有用。