Python 基于OHLC数据的OHLC聚合_Python_Python 2.7_Pandas_Dataframe_Resampling

Python 基于OHLC数据的OHLC聚合

python python-2.7 pandas dataframe

Python 基于OHLC数据的OHLC聚合,python,python-2.7,pandas,dataframe,resampling,Python,Python 2.7,Pandas,Dataframe,Resampling,我理解，OHLC使用一列数据对熊猫中的时间序列数据进行重新采样将非常有效，例如在以下数据帧上： >>df ctime openbid 1443654000 1.11700 1443654060 1.11700 ... df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = df.set_index('ctime') df.resample('1H', how='ohlc', axis=

我理解，OHLC使用一列数据对熊猫中的时间序列数据进行重新采样将非常有效，例如在以下数据帧上：

>>df
ctime       openbid
1443654000  1.11700
1443654060  1.11700
...

df['ctime']  = pd.to_datetime(df['ctime'], unit='s')
df           = df.set_index('ctime')
df.resample('1H',  how='ohlc', axis=0, fill_method='bfill')


>>>
                     open     high     low       close
ctime                                                   
2015-09-30 23:00:00  1.11700  1.11700  1.11687   1.11697
2015-09-30 24:00:00  1.11700  1.11712  1.11697   1.11697
...

但是，如果数据已经是OHLC格式，我该怎么办？根据我所能收集的数据，API的OHLC方法为每列计算一个OHLC切片，因此，如果我的数据采用以下格式：

             ctime  openbid  highbid   lowbid  closebid
0       1443654000  1.11700  1.11700  1.11687   1.11697
1       1443654060  1.11700  1.11712  1.11697   1.11697
2       1443654120  1.11701  1.11708  1.11699   1.11708

当我尝试重新采样时，我会为每一列获取OHLC，如下所示：

                     openbid                             highbid           \
                        open     high      low    close     open     high   
ctime                                                                       
2015-09-30 23:00:00  1.11700  1.11700  1.11700  1.11700  1.11700  1.11712   
2015-09-30 23:01:00  1.11701  1.11701  1.11701  1.11701  1.11708  1.11708 
...
                                        lowbid                             \
                         low    close     open     high      low    close   
ctime                                                                       
2015-09-30 23:00:00  1.11700  1.11712  1.11687  1.11697  1.11687  1.11697   
2015-09-30 23:01:00  1.11708  1.11708  1.11699  1.11699  1.11699  1.11699  
...

                    closebid                             
                        open     high      low    close  
ctime                                                    
2015-09-30 23:00:00  1.11697  1.11697  1.11697  1.11697  
2015-09-30 23:01:00  1.11708  1.11708  1.11708  1.11708

import pandas as pd
from collections import OrderedDict

df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df = df.resample('5Min').agg(
    OrderedDict([
        ('open', 'first'),
        ('high', 'max'),
        ('low', 'min'),
        ('close', 'last'),
        ('volume', 'sum'),
    ])
)

有没有一个快速（ish）的解决办法，有人愿意分享，请不要让我在膝盖深熊猫手册

谢谢

ps，有这个答案——但那是4年前的事了，所以我希望有一些进展。

这与您链接的答案类似，但它更干净、更快，因为它使用了优化的聚合，而不是lambdas

请注意，

resample（…）.agg（…）

语法需要pandas版本

0.18.0

In [101]: df.resample('1H').agg({'openbid': 'first', 
                                 'highbid': 'max', 
                                 'lowbid': 'min', 
                                 'closebid': 'last'})
Out[101]: 
                      lowbid  highbid  closebid  openbid
ctime                                                   
2015-09-30 23:00:00  1.11687  1.11712   1.11708    1.117

在较新版本的pandas中，您需要使用OrderedDict来保持行顺序，如下所示：

                     openbid                             highbid           \
                        open     high      low    close     open     high   
ctime                                                                       
2015-09-30 23:00:00  1.11700  1.11700  1.11700  1.11700  1.11700  1.11712   
2015-09-30 23:01:00  1.11701  1.11701  1.11701  1.11701  1.11708  1.11708 
...
                                        lowbid                             \
                         low    close     open     high      low    close   
ctime                                                                       
2015-09-30 23:00:00  1.11700  1.11712  1.11687  1.11697  1.11687  1.11697   
2015-09-30 23:01:00  1.11708  1.11708  1.11699  1.11699  1.11699  1.11699  
...

                    closebid                             
                        open     high      low    close  
ctime                                                    
2015-09-30 23:00:00  1.11697  1.11697  1.11697  1.11697  
2015-09-30 23:01:00  1.11708  1.11708  1.11708  1.11708

import pandas as pd
from collections import OrderedDict

df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df = df.resample('5Min').agg(
    OrderedDict([
        ('open', 'first'),
        ('high', 'max'),
        ('low', 'min'),
        ('close', 'last'),
        ('volume', 'sum'),
    ])
)

给定一个包含价格和金额列的数据帧

def agg_ohlcv(x):
    arr = x['price'].values
    names = {
        'low': min(arr) if len(arr) > 0 else np.nan,
        'high': max(arr) if len(arr) > 0 else np.nan,
        'open': arr[0] if len(arr) > 0 else np.nan,
        'close': arr[-1] if len(arr) > 0 else np.nan,
        'volume': sum(x['amount'].values) if len(x['amount'].values) > 0 else 0,
    }
    return pd.Series(names)

df = df.resample('1min').apply(agg_ohlcv)
df = df.ffill()

这个好像行得通

def ohlcVolume(x):
    if len(x):
        ohlc={ "open":x["open"][0],"high":max(x["high"]),"low":min(x["low"]),"close":x["close"][-1],"volume":sum(x["volume"])}
        return pd.Series(ohlc)

daily=df.resample('1D').apply(ohlcVolume)

从OHLC到OHLC的对话对我来说是这样的：

df.resample('1H').agg({
    'openbid':'first',
    'highbid':'max',
    'lowbid':'min',
    'closebid':'last'
})

是的，我会选择那个；这意味着更新pandas，但我的功能运行方式意味着这是更好的选择。谢谢。你知道我是否可以将

fill\u method='bfill'

方法添加到该解决方案中来处理NAN吗？忘记最后一个问题，该方法已更改为.bfill（）。如果你在上面的ctime索引中遇到错误，这里有一个替代方法：df=df.set\u index（'datetime'））我必须使用这些参数来匹配我的制图平台

df.resample（'1H'，closed='right'，label='right'）.agg（{'open'：'first'，'high'：'max'，'low'：'min'，'close'：'last'}）

请解释。卷len（）怎么了？