Python 带滚动窗口问题的STATSOLS模型

Python 带滚动窗口问题的STATSOLS模型,python,regression,Python,Regression,我想使用滚动窗口进行回归,但回归后我只得到一个参数: rolling_beta = sm.OLS(X2, X1, window_type='rolling', window=30).fit() rolling_beta.params 结果是: X1 5.715089 dtype: float64 有什么问题吗 提前感谢,罗兰我认为问题在于参数window\u type='rolling'和window=30根本不起任何作用。首先,我会告诉你为什么,最后,我会提供一个设置,我已经

我想使用滚动窗口进行回归,但回归后我只得到一个参数:

 rolling_beta = sm.OLS(X2, X1, window_type='rolling', window=30).fit()
 rolling_beta.params
结果是:

 X1    5.715089
 dtype: float64
有什么问题吗


提前感谢,罗兰

我认为问题在于参数
window\u type='rolling'
window=30
根本不起任何作用。首先,我会告诉你为什么,最后,我会提供一个设置,我已经躺在滚动窗口线性回归


1.您的函数出现问题:

由于您没有提供一些示例数据,下面是一个函数,它返回一个具有一些随机数的所需大小的数据帧:

# Function to build synthetic data
import numpy as np
import pandas as pd
import statsmodels.api as sm
from collections import OrderedDict

def sample(rSeed, periodLength, colNames):

    np.random.seed(rSeed)
    date = pd.to_datetime("1st of Dec, 1999")   
    cols = OrderedDict()

    for col in colNames:
        cols[col] = np.random.normal(loc=0.0, scale=1.0, size=periodLength)
    dates = date+pd.to_timedelta(np.arange(periodLength), 'D')

    df = pd.DataFrame(cols, index = dates)
    return(df)
输出:

X1        X2
2018-12-01 -1.085631 -1.294085
2018-12-02  0.997345 -1.038788
2018-12-03  0.282978  1.743712
2018-12-04 -1.506295 -0.798063
2018-12-05 -0.578600  0.029683
.
.
.
2019-01-17  0.412912 -1.363472
2019-01-18  0.978736  0.379401
2019-01-19  2.238143 -0.379176
X1   -0.075784
dtype: float64
X1   -0.075784
dtype: float64
现在,试试:

rolling_beta = sm.OLS(df['X2'], df['X1'], window_type='rolling', window=30).fit()
rolling_beta.params
输出:

X1        X2
2018-12-01 -1.085631 -1.294085
2018-12-02  0.997345 -1.038788
2018-12-03  0.282978  1.743712
2018-12-04 -1.506295 -0.798063
2018-12-05 -0.578600  0.029683
.
.
.
2019-01-17  0.412912 -1.363472
2019-01-18  0.978736  0.379401
2019-01-19  2.238143 -0.379176
X1   -0.075784
dtype: float64
X1   -0.075784
dtype: float64
这至少也代表了输出的结构,也就是说,您期望每个示例窗口都有一个估计值,但是您得到的是一个估计值。因此,我在网上和statsmodels文档中查找了一些使用相同函数的其他示例,但找不到实际有效的具体示例。我确实发现了一些讨论,讨论了这个功能在不久前是如何被弃用的。然后,我用一些参数的虚假输入测试了相同的函数:

rolling_beta = sm.OLS(df['X2'], df['X1'], window_type='amazing', window=3000000).fit()
rolling_beta.params
输出:

X1        X2
2018-12-01 -1.085631 -1.294085
2018-12-02  0.997345 -1.038788
2018-12-03  0.282978  1.743712
2018-12-04 -1.506295 -0.798063
2018-12-05 -0.578600  0.029683
.
.
.
2019-01-17  0.412912 -1.363472
2019-01-18  0.978736  0.379401
2019-01-19  2.238143 -0.379176
X1   -0.075784
dtype: float64
X1   -0.075784
dtype: float64
正如您所看到的,估计值是相同的,对于伪输入没有返回错误消息。所以我建议你看看下面的函数。这是我用来进行滚动回归估计的东西


2.数据帧滚动窗口上的回归函数

df = sample(rSeed = 123, colNames = ['X1', 'X2', 'X3'], periodLength = 50)

def RegressionRoll(df, subset, dependent, independent, const, win, parameters):
    """
    RegressionRoll takes a dataframe, makes a subset of the data if you like,
    and runs a series of regressions with a specified window length, and
    returns a dataframe with BETA or R^2 for each window split of the data.

    Parameters:
    ===========

    df: pandas dataframe
    subset: integer - has to be smaller than the size of the df
    dependent: string that specifies name of denpendent variable
    inependent: LIST of strings that specifies name of indenpendent variables
    const: boolean - whether or not to include a constant term
    win: integer - window length of each model
    parameters: string that specifies which model parameters to return:
                BETA or R^2

    Example:
    ========
        RegressionRoll(df=df, subset = 50, dependent = 'X1', independent = ['X2'],
                   const = True, parameters = 'beta', win = 30)

    """

    # Data subset
    if subset != 0:
        df = df.tail(subset)
    else:
        df = df

    # Loopinfo
    end = df.shape[0]
    win = win
    rng = np.arange(start = win, stop = end, step = 1)

    # Subset and store dataframes
    frames = {}
    n = 1

    for i in rng:
        df_temp = df.iloc[:i].tail(win)
        newname = 'df' + str(n)
        frames.update({newname: df_temp})
        n += 1

    # Analysis on subsets
    df_results = pd.DataFrame()
    for frame in frames:
        #print(frames[frame])

        # Rolling data frames
        dfr = frames[frame]
        y = dependent
        x = independent

        if const == True:
            x = sm.add_constant(dfr[x])
            model = sm.OLS(dfr[y], x).fit()
        else:
            model = sm.OLS(dfr[y], dfr[x]).fit()

        if parameters == 'beta':
            theParams = model.params[0:]
            coefs = theParams.to_frame()
            df_temp = pd.DataFrame(coefs.T)

            indx = dfr.tail(1).index[-1]
            df_temp['Date'] = indx
            df_temp = df_temp.set_index(['Date'])

        if parameters == 'R2':
            theParams = model.rsquared
            df_temp = pd.DataFrame([theParams])
            indx = dfr.tail(1).index[-1]
            df_temp['Date'] = indx
            df_temp = df_temp.set_index(['Date'])
            df_temp.columns = [', '.join(independent)]
        df_results = pd.concat([df_results, df_temp], axis = 0)

    return(df_results)


df_rolling = RegressionRoll(df=df, subset = 50, dependent = 'X1', independent = ['X2'], const = True, parameters = 'beta',
                                     win = 30)
输出:一个数据帧,每30个数据周期窗口的OLS估计值为X2对X1

const        X2
Date                          
2018-12-30  0.044042  0.032680
2018-12-31  0.074839 -0.023294
2019-01-01 -0.063200  0.077215
.
.
.
2019-01-16 -0.075938 -0.215108
2019-01-17 -0.143226 -0.215524
2019-01-18 -0.129202 -0.170304

六个月后,我看不出有任何迹象表明window_type='rolling'是statsmodels界面的一部分。也许你检查了这个,它正在进化?搜索“window_type”只会显示一个私有函数&window_ols。此外,对这一问题和几个类似问题的回答都是基于每个步骤单独重新格式化和调用sm.ols。这是一种低效的滚动回归方法,统计学家都能理解这一点——参见参考文献。使用更新方法(整个Givens轮换法)来实现这一点是非常经济的。如果你想交钥匙,目前这意味着运输到R,我很感激这可能无法解决你的问题environment@RolandSzarka正如Eli S所指出的,
window=30)
方法似乎并不存在。所以,如果我的建议对你有帮助,你会考虑把它作为公认的答案吗?我知道你现在已经问了7个问题,收到了很多好的建议,但一次也没有接受答案。接受答案是一件好事,因为它使那些寻找未回答问题的人更容易提供帮助和建议。@Roland Szarka我的建议对你有何帮助?