Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/279.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在熊猫中使用.asof和多索引_Python_Pandas - Fatal编程技术网

Python 在熊猫中使用.asof和多索引

Python 在熊猫中使用.asof和多索引,python,pandas,Python,Pandas,我看到这个问题被问了好几次,但都没有答案。简短版本: 我有一个熊猫DataFrame和一个两级多索引索引;两个级别都是整数。如何在此数据帧上使用.asof() 长版本: 我有一个带有一些时间序列数据的DataFrame: >>> df A 2016-01-01 00:00:00 1.560878 2016-01-01 01:00:00 -1.029380 ... ... 201

我看到这个问题被问了好几次,但都没有答案。简短版本:

我有一个熊猫
DataFrame
和一个两级
多索引
索引;两个级别都是整数。如何在此数据帧上使用
.asof()

长版本:

我有一个带有一些时间序列数据的
DataFrame

>>> df
                            A
2016-01-01 00:00:00  1.560878
2016-01-01 01:00:00 -1.029380
...                       ...
2016-01-30 20:00:00  0.429422
2016-01-30 21:00:00 -0.182349
2016-01-30 22:00:00 -0.939461
2016-01-30 23:00:00  0.009930
2016-01-31 00:00:00 -0.854283

[721 rows x 1 columns]
然后,我将构建该数据的每周模型:

>>> df['weekday'] = df.index.weekday
>>> df['hour_of_day'] = df.index.hour
>>> weekly_model = df.groupby(['weekday', 'hour_of_day']).mean()
>>> weekly_model
                            A
weekday hour_of_day          
0       0            0.260597
        1            0.333094
...                       ...
        20           0.388932
        21          -0.082020
        22          -0.346888
        23           1.525928
[168 rows x 1 columns]
这给了我一个带有上述索引的
数据帧

我现在尝试将该模型外推到年度时间序列中:

>>> dates = pd.date_range('2015/1/1', '2015/12/31 23:59', freq='H')
>>> annual_series = weekly
weekly        weekly_model  
>>> annual_series = weekly_model.A.asof((dates.weekday, dates.hour))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/core/series.py", line 2657, in asof
    locs = self.index.asof_locs(where, notnull(values))
  File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/indexes/base.py", line 1553, in asof_locs
    locs = self.values[mask].searchsorted(where.values, side='right')
ValueError: operands could not be broadcast together with shapes (8760,) (2,) 
>>> dates = pd.date_range('2015/1/1', '2015/12/31 23:59', freq='H')
>>> annual_series = weekly_model.A.asof((dates.weekday, dates.hour))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/core/series.py", line 2657, in asof
    locs = self.index.asof_locs(where, notnull(values))
  File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/indexes/base.py", line 1553, in asof_locs
    locs = self.values[mask].searchsorted(where.values, side='right')
ValueError: operands could not be broadcast together with shapes (8760,) (2,) 

它可以工作,但这意味着首先将
zip
迭代器转换为一个列表,这并不完全是内存友好的。有什么办法可以避免这种情况吗?

我多次阅读了你的文章,我想我终于明白了你想要实现的目标

试试这个:

df['weekday'] = df.index.weekday
df['hour_of_day'] = df.index.hour
weekly_model = df.groupby(['weekday', 'hour_of_day']).mean()
dates = pd.date_range('2015/1/1', '2015/12/31 23:59', freq='H')
然后像这样使用合并:

annual_series = pd.merge(df.reset_index(), weekly_model.reset_index(), on=['weekday', 'hour_of_day']).set_index('date')
现在您可以使用asof,因为您将日期作为索引

annual_series.asof(dates)

这就是你要找的吗?

嘿,汤姆,asof需要一个日期输入。使用dates.weekday或dates.hour时,它将返回一个整数
annual_series.asof(dates)