Python Pandas和PanelOLS:仅支持2级多索引

Python Pandas和PanelOLS:仅支持2级多索引,python,pandas,dataframe,regression,Python,Pandas,Dataframe,Regression,我有这样一个数据帧: 我已按年份和fcode排序: df.sort_index(by=['year','fcode']) 我删除了丢失的数据: df = df.dropna() # Drop missing 我明白了: year fcode y x 30 1987 410523 -2.813411 0 48 1987 410538 0.970779 0 75 1987 410563 1.791759 0

我有这样一个数据帧:

我已按
年份
fcode
排序:

df.sort_index(by=['year','fcode'])
我删除了丢失的数据:

df = df.dropna() # Drop missing
我明白了:

     year   fcode         y       x
30  1987  410523 -2.813411      0
48  1987  410538  0.970779      0
75  1987  410563  1.791759      0
81  1987  410565  3.044523      0
84  1987  410566  1.945910      0
87  1987  410567  0.000000      0
96  1987  410577  0.518794      0
105 1987  410592  3.401197      0
108 1987  410593  0.000000      0
111 1987  410596  2.302585      0
120 1987  410606 -0.415515      0
129 1987  410626 -0.139262      0
135 1987  410629  0.182322      0
159 1987  410653  0.058269      0
162 1987  410665 -2.995732      0
171 1987  410685 -1.966113      0
186 1987  418011  2.302585      0
195 1987  418021  0.000000      0
201 1987  418035  1.791759      0
207 1987  418045  0.693147      0
213 1987  418051 -0.798508      0
219 1987  418054  0.223143      0
222 1987  418065  0.262364      0
228 1987  418076  0.058269      0
231 1987  418083  1.098612      0
237 1987  418091  2.101692      0
240 1987  418097  0.512824      0
246 1987  418107 -0.020203      0
252 1987  418118  0.000000      0
258 1987  418125 -0.798508      0
...          ...       ...    ...
233 1989  418083  0.000000      0
239 1989  418091 -0.579819      0
242 1989  418097  0.350657      0
248 1989  418107 -0.798508      0
254 1989  418118 -2.302585      0
260 1989  418125 -0.510826      0
266 1989  418140  0.916291      0
272 1989  418163  1.871802      0
275 1989  418168 -1.609438      0
278 1989  418177  2.890372      0
299 1989  418237 -1.660731      0
311 1989  419198  1.386294      0
314 1989  419201  0.693147      0
317 1989  419242  1.740466      0
320 1989  419268 -0.105360      1
323 1989  419272  2.833213      1
332 1989  419289 -0.051293      1
335 1989  419297 -1.309333      0
350 1989  419307 -0.116534      1
368 1989  419339 -0.798508      0
371 1989  419343  1.098612      1
383 1989  419357 -0.693147      1
392 1989  419378  0.292670      1
401 1989  419381 -0.967584      1
407 1989  419388  1.791759      1
422 1989  419409  0.693147      1
431 1989  419432  1.648659      0
446 1989  419459  0.113329      0
464 1989  419482  1.029619      0
467 1989  419483  3.401197      0
我尝试运行以下命令:

model  = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
我得到这个错误:

raise NOTEImplementedError('仅支持2级多索引') NotImplementedError:仅支持2级多索引

我不知道我做错了什么。您可以看到,我的代码似乎与

当我加上

df=df.set_index('year', append=True)
我明白了

您可以尝试:

print df.head()
    year   fcode         y  x
30  1987  410523 -2.813411  0
48  1987  410538  0.970779  0
75  1987  410563  1.791759  0
81  1987  410565  3.044523  0
84  1987  410566  1.945910  0

#convert year to datetime
df['year'] = pd.to_datetime(df['year'], format='%Y')
#add column year to index
df=df.set_index('year', append=True)
#swap indexes
df.index = df.index.swaplevel(0,1)
print df.head()
                fcode         y  x
year                              
1987-01-01 30  410523 -2.813411  0
           48  410538  0.970779  0
           75  410563  1.791759  0
           81  410565  3.044523  0
           84  410566  1.945910  0

model  = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
打印模型
-------------------------回归分析综述-------------------------
公式:Y~
观察次数:60
自由度:3
R平方:0.0013
调整R平方:-0.0338
Rmse:1.4727
F-stat(1,57):0.0364,p值:0.8493
自由度:模型2,剩余57
-----------------------估计系数摘要------------------------
可变系数标准误差t-stat p值CI 2.5%CI 97.5%
--------------------------------------------------------------------------------
x 0.1539 0.5704 0.27 0.7882-0.9640 1.2719
---------------------------------摘要结束---------------------------------

也许您可以在索引中添加列-
df=df。设置索引('year',append=True)
-结果是带有
多索引的df
谢谢!错误已经消失,但我相信仍然存在问题,因为我正在找到一个所有统计数据都为空的模型。请看上面的版本。谢谢!它是什么意思df.index=df.index.swaplevel(0,1)?自由度达到60-3对我来说很奇怪。在典型的固定效应面板数据模型中,它转到N(T-1)-K.
swaplevel
-它将第一级多指标与第二级一指标交换,因为您需要第一级多指标
年份
。为什么需要有一个多索引?
Degrees of Freedom: model 161, resid 0

    -----------------------Summary of Estimated Coefficients------------------------
          Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
    --------------------------------------------------------------------------------
             x     0.0000        nan        nan        nan        nan        nan
print df.head()
    year   fcode         y  x
30  1987  410523 -2.813411  0
48  1987  410538  0.970779  0
75  1987  410563  1.791759  0
81  1987  410565  3.044523  0
84  1987  410566  1.945910  0

#convert year to datetime
df['year'] = pd.to_datetime(df['year'], format='%Y')
#add column year to index
df=df.set_index('year', append=True)
#swap indexes
df.index = df.index.swaplevel(0,1)
print df.head()
                fcode         y  x
year                              
1987-01-01 30  410523 -2.813411  0
           48  410538  0.970779  0
           75  410563  1.791759  0
           81  410565  3.044523  0
           84  410566  1.945910  0

model  = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
print model
-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x>

Number of Observations:         60
Number of Degrees of Freedom:   3

R-squared:         0.0013
Adj R-squared:    -0.0338

Rmse:              1.4727

F-stat (1, 57):     0.0364, p-value:     0.8493

Degrees of Freedom: model 2, resid 57

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     0.1539     0.5704       0.27     0.7882    -0.9640     1.2719
---------------------------------End of Summary---------------------------------