Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:在复函数上应用滚动窗口(赫斯特指数)_Python_Pandas - Fatal编程技术网

Python 熊猫:在复函数上应用滚动窗口(赫斯特指数)

Python 熊猫:在复函数上应用滚动窗口(赫斯特指数),python,pandas,Python,Pandas,简而言之:我需要计算pandas数据帧内滚动窗口的Hurst指数(HE),并将值分配给它自己的列 我使用的HE函数被取消了,因为它看起来更加健壮。为方便起见,发布在下面: def HurstEXP( ts = [ None, ] ): # TESTED: HurstEXP() Hurst exponent ( Browninan Motion & other observati

简而言之:我需要计算pandas数据帧内滚动窗口的Hurst指数(HE),并将值分配给它自己的列

我使用的HE函数被取消了,因为它看起来更加健壮。为方便起见,发布在下面:

def HurstEXP( ts = [ None, ] ):                                         
# TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
        """                                                         __doc__
        USAGE:
                    HurstEXP( ts = [ None, ] )

                    Returns the Hurst Exponent of the time series vector ts[]

        PARAMETERS:
                    ts[,]   a time-series, with 100+ elements
                            ( or [ None, ] that produces a demo run )

        RETURNS:
                    float - a Hurst Exponent approximation,
                            as a real value
                            or
                            an explanatory string on an empty call
        THROWS:
                    n/a
        EXAMPLE:
                    >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                    HurstEXP( Geometric Browian Motion ):    0.49447454
                    HurstEXP(    Mean-Reverting Series ):   -0.00016013
                    HurstEXP(          Trending Series ):    0.95748937
                    'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                    >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
        REF.s:
                    >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
        """
        #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
        if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

             gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
             mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
             tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                    # Output the Hurst Exponent for each of the above SYNTH series
             print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
             print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
             print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

             return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
        """                                                         # FIX:
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
        if ( 0 <  too_short_list ):                                 # IF NOT:
             ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
        #---------------------------------------------------------------------------------------------------------------------------
        lags = range( 2, 100 )                                                              # Create the range of lag values
        tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
        #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
        #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
        """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
          warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
          arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
          ret, rcount, out=ret, casting='unsafe', subok=False)
        QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
          return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471879418359915
        |
        |
        |# DATA:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
        memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                 1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                 1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                 1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                 1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                 1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                 1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                 1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                 1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                 1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                 1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                 1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                 1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                 1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                 1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                 1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                 1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                 1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                 1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                 1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                 ], dtype=float32
                )
        |
        |
        | # CONVERTED .tolist() to avoid .memmap-type artifacts:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
        [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
        859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
          ret = ret.dtype.type(ret / rcount)

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471876494884543
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )  
输出:

Date        Close
2016-02-16  31.034000
2016-02-17  33.736000
2016-02-18  33.354000
2016-02-19  33.316002
2016-02-22  35.548000
... ...
2021-02-08  863.419983
2021-02-09  849.460022
2021-02-10  804.820007
2021-02-11  811.659973
2021-02-12  816.119995
1259 rows × 1 columns
0.5163981260143369
到目前为止还不错。我们有测试它的功能和数据。现在让我们做一个健全性测试,即针对数据的子样本运行函数:

import numpy as np
window = 20

hurst = lambda x: (HurstEXP(ts = df[u'Close'][:-x].to_numpy()))
hurst(window)
输出:

Date        Close
2016-02-16  31.034000
2016-02-17  33.736000
2016-02-18  33.354000
2016-02-19  33.316002
2016-02-22  35.548000
... ...
2021-02-08  863.419983
2021-02-09  849.460022
2021-02-10  804.820007
2021-02-11  811.659973
2021-02-12  816.119995
1259 rows × 1 columns
0.5163981260143369
好极了

现在是肉质部分。在滚动窗口中应用lambda并将结果指定给它自己的列。我几乎尝试了我能挖掘出的每一个技巧,但无法使其发挥作用

一般做法:

df.Close.rolling(window).apply(hurst, engine='cython', raw=True)
给我以下错误:

TypeError: Cannot convert input [[-31.0340004  -33.73600006 -33.35400009 -33.31600189 -35.54800034
 -35.44200134 -35.79999924 -37.48600006 -38.06800079 -38.38600159
 -37.27000046 -37.66799927 -39.14799881 -40.20800018 -41.05799866
 -40.52000046 -41.74399948 -41.0359993  -41.5        -43.02999878]] of type <class 'numpy.ndarray'> to Timestamp
也不光彩地失败了。所以在这一点上——半天之后——我几乎被难住了。你们这些铁杆python爱好者知道这样做的方法吗


非常感谢您提供的任何见解和建议。

我认为您的问题在于您的窗口太短。它在docstring中说窗口长度必须是100+个元素,Hurst代码没有正确处理它,导致SVD失败

另外,您的测试实际上是切片所有元素,但最后20个元素除外,因此实际上是一个长数组,这就是它没有失败的原因:

tmp = df[u'Close'][:-20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(1239,) 0.5163981260143368
如果测试长度小于100的窗口,它将抛出LinAlg异常:

tmp = df[u'Close'][:20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(fails)
如果增加滚动窗口长度,或者修复Hurst函数中的代码以填充数组(如果数组太短),则应该可以使用

window = 500
df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), raw=True)
处理少于100个元素的列表的
HurstEXP
函数中的代码对于
ts
的值无效,这些值是
np.ndarray
对象,如
.rolling(raw=True)
提供的对象

您可以修改该函数以从以下内容开始,它将适用于100个元素以下的窗口:

def HurstEXP( ts= [ None, ] ):   
        if isinstance(ts, np.ndarray):
            ts = ts.tolist()
        
…或者,如果您总是使用numpy阵列,则可以更改修复该阵列的行:

     ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD


Hi Rick-文档字符串中说“系列”需要100多个元素,而不是窗口需要那么宽。毕竟,我用20个窗口测试了一个样本,它工作得很好(同样,请参见我上面的“理智测试”)。无论如何,只是为了确保我将窗口大小增加到100,但通过滚动窗口应用时仍然失败。因此,Hurst函数似乎工作得很好,我只是不知道如何通过滚动窗口应用它。
rolling
中的窗口长度决定了发送到Hurst函数的序列的长度。您的心智测试并没有发送一个包含20个元素的窗口,而是发送了1239个元素。此外,您最初的尝试应用的是
hurst
,而不是
HurstEXP
。你试过我上面写的代码了吗?谢谢你的快速回复。原来你是对的,我刚刚运行了这个:
window=200
df2=df.Close.rolling(window.apply)(lambda x:HurstEXP(ts=x),engine='cython',raw=True)
df2=df2.dropna()
df2
对不起,我不知道如何在响应中正确格式化我的代码。不管怎样,它是有效的,但只有当窗口长度为101或更大时才有效。不适用于100个窗口。仅供参考-我没有编写Hurst函数,很难修复它。太好了。我补充了如何修复赫斯特函数。如果答案解决了您的问题,请单击绿色复选标记将其标记为“已接受”。这有助于将注意力集中在仍然没有答案的老问题上。谢谢,多保重。
     ts = np.pad(ts, pad_width=(too_short_list,0), mode='edge')