Python 熊猫问题中的面板回归_Python_Pandas_Regression

Python 熊猫问题中的面板回归

python pandas

Python 熊猫问题中的面板回归,python,pandas,regression,Python,Pandas,Regression,我正在使用运行python中的面板回归，因为我无法访问statsmodels 我的数据框架如下所示： Sum Amt_1 ... Sum Amt_2 Date Range_1 Range_2 info_1 info

我正在使用运行python中的面板回归，因为我无法访问

statsmodels

我的数据框架如下所示：

                                                                                                       Sum Amt_1         ...                   Sum Amt_2
Date      Range_1    Range_2    info_1    info_2       info_3                   info_4                     ...                              
01/01/19   &gt;720  &gt;30.0    &gt;5.0    &lt;=70.0   lessthan_12m              &lt;= 0                    0.00         ...                     631427.36
                                                                                  1-10                       0.00         ...                      30420.78
                                                                                  21-30                      0.00         ...                      20276.26
                                                                                  31-40                      0.00         ...                      76939.48
                                                         morethan_12m           &gt; 50                    0.00         ...                      10288.87

应答器数据框如下所示：

            Intercept      beta     r12to2   r36to13
caldt                                               
1963-07-01  -1.497012 -0.765721   4.379128 -1.918083
1963-08-01  11.144169 -6.506291   5.961584 -2.598048
1963-09-01  -2.330966 -0.741550  10.508617 -4.377293
1963-10-01   0.441941  1.127567   5.478114 -2.057173
1963-11-01   3.380485 -4.792643   3.660940 -1.210426

我尝试使用下面的代码运行相同的回归，实际上我想做与答案中相同的事情，但通过对所有列进行分组，除了

Sum Amt_1

和

Sum Amt_2

，因为这些都是分类变量

def ols_coef(x,formula):

    return ols(formula,data=x).fit().params

gamma = (df.groupby(['Date', 'Range_1', 'Range_2', 'info_1', 'info_2']))

                .apply(ols_coef,'Sum_Amt_1 ~ C(Range_1)  + C(Range_2) + C(info_1) + C(info_2)'))

但是，当我运行

print（gamma）

时，我得到：

                                                                                     Intercept
Date      Range_1           Range_2        info_1             info_2                                 
01/01/19   &gt; 30.0         &gt; 5.0     DQ_lessthan_12m     &gt; 50               1994.545600
                                                              &lt;= 0                  0.000000
                                                              1-10                     0.000000
                                                              11-20                    0.000000
                                                              21-30                 5740.748889
                                                              31-40                    0.000000
                                                              41-50                    0.000000

我知道回归只在非索引元素上运行，但我如何在这些索引元素上运行回归，即

Sum\u Amt\u 1

上的

“Range\u 1”、“Range\u 2”、“info\u 1”、“info\u 2”

？

您不能通过调用熊猫中的命名索引来分组。当您运行

df.groupby（['Date'，'Range\u 1'，'Range\u 2'，'info\u 1'，'info\u 2']）

时，它实际上什么也没做。下面是我的意思的一个例子：

### Creating a multi-index dataframe
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])

df.columns = [' '.join(col).strip() for col in df.columns]

### grouping by named index

df.groupby(['first', 'second']).sum() ##Nothing happens
df.groupby(level = 0).sum()  ## Correct way to groupby index

我建议，如果您想对索引变量运行回归，您可以通过简单地

df.reset\u index

（然后您可以调用名称）重新索引数据帧，或者显式调用级别。选择权在你

更多信息可以在这篇文章中找到：

你不能通过调用熊猫的名字来对熊猫中的命名索引进行分组。当您运行

df.groupby（['Date'，'Range\u 1'，'Range\u 2'，'info\u 1'，'info\u 2']）

时，它实际上什么也没做。下面是我的意思的一个例子：

### Creating a multi-index dataframe
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])

df.columns = [' '.join(col).strip() for col in df.columns]

### grouping by named index

df.groupby(['first', 'second']).sum() ##Nothing happens
df.groupby(level = 0).sum()  ## Correct way to groupby index

我建议，如果您想对索引变量运行回归，您可以通过简单地

df.reset\u index

（然后您可以调用名称）重新索引数据帧，或者显式调用级别。选择权在你

更多信息可在此帖子中找到：