Python Pandas和PanelOLS:仅支持2级多索引
我有这样一个数据帧: 我已按Python Pandas和PanelOLS:仅支持2级多索引,python,pandas,dataframe,regression,Python,Pandas,Dataframe,Regression,我有这样一个数据帧: 我已按年份和fcode排序: df.sort_index(by=['year','fcode']) 我删除了丢失的数据: df = df.dropna() # Drop missing 我明白了: year fcode y x 30 1987 410523 -2.813411 0 48 1987 410538 0.970779 0 75 1987 410563 1.791759 0
年份
和fcode
排序:
df.sort_index(by=['year','fcode'])
我删除了丢失的数据:
df = df.dropna() # Drop missing
我明白了:
year fcode y x
30 1987 410523 -2.813411 0
48 1987 410538 0.970779 0
75 1987 410563 1.791759 0
81 1987 410565 3.044523 0
84 1987 410566 1.945910 0
87 1987 410567 0.000000 0
96 1987 410577 0.518794 0
105 1987 410592 3.401197 0
108 1987 410593 0.000000 0
111 1987 410596 2.302585 0
120 1987 410606 -0.415515 0
129 1987 410626 -0.139262 0
135 1987 410629 0.182322 0
159 1987 410653 0.058269 0
162 1987 410665 -2.995732 0
171 1987 410685 -1.966113 0
186 1987 418011 2.302585 0
195 1987 418021 0.000000 0
201 1987 418035 1.791759 0
207 1987 418045 0.693147 0
213 1987 418051 -0.798508 0
219 1987 418054 0.223143 0
222 1987 418065 0.262364 0
228 1987 418076 0.058269 0
231 1987 418083 1.098612 0
237 1987 418091 2.101692 0
240 1987 418097 0.512824 0
246 1987 418107 -0.020203 0
252 1987 418118 0.000000 0
258 1987 418125 -0.798508 0
... ... ... ...
233 1989 418083 0.000000 0
239 1989 418091 -0.579819 0
242 1989 418097 0.350657 0
248 1989 418107 -0.798508 0
254 1989 418118 -2.302585 0
260 1989 418125 -0.510826 0
266 1989 418140 0.916291 0
272 1989 418163 1.871802 0
275 1989 418168 -1.609438 0
278 1989 418177 2.890372 0
299 1989 418237 -1.660731 0
311 1989 419198 1.386294 0
314 1989 419201 0.693147 0
317 1989 419242 1.740466 0
320 1989 419268 -0.105360 1
323 1989 419272 2.833213 1
332 1989 419289 -0.051293 1
335 1989 419297 -1.309333 0
350 1989 419307 -0.116534 1
368 1989 419339 -0.798508 0
371 1989 419343 1.098612 1
383 1989 419357 -0.693147 1
392 1989 419378 0.292670 1
401 1989 419381 -0.967584 1
407 1989 419388 1.791759 1
422 1989 419409 0.693147 1
431 1989 419432 1.648659 0
446 1989 419459 0.113329 0
464 1989 419482 1.029619 0
467 1989 419483 3.401197 0
我尝试运行以下命令:
model = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
我得到这个错误:
raise NOTEImplementedError('仅支持2级多索引')
NotImplementedError:仅支持2级多索引
我不知道我做错了什么。您可以看到,我的代码似乎与
当我加上
df=df.set_index('year', append=True)
我明白了
您可以尝试:
print df.head()
year fcode y x
30 1987 410523 -2.813411 0
48 1987 410538 0.970779 0
75 1987 410563 1.791759 0
81 1987 410565 3.044523 0
84 1987 410566 1.945910 0
#convert year to datetime
df['year'] = pd.to_datetime(df['year'], format='%Y')
#add column year to index
df=df.set_index('year', append=True)
#swap indexes
df.index = df.index.swaplevel(0,1)
print df.head()
fcode y x
year
1987-01-01 30 410523 -2.813411 0
48 410538 0.970779 0
75 410563 1.791759 0
81 410565 3.044523 0
84 410566 1.945910 0
model = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
打印模型
-------------------------回归分析综述-------------------------
公式:Y~
观察次数:60
自由度:3
R平方:0.0013
调整R平方:-0.0338
Rmse:1.4727
F-stat(1,57):0.0364,p值:0.8493
自由度:模型2,剩余57
-----------------------估计系数摘要------------------------
可变系数标准误差t-stat p值CI 2.5%CI 97.5%
--------------------------------------------------------------------------------
x 0.1539 0.5704 0.27 0.7882-0.9640 1.2719
---------------------------------摘要结束---------------------------------
也许您可以在索引中添加列-df=df。设置索引('year',append=True)
-结果是带有多索引的df
谢谢!错误已经消失,但我相信仍然存在问题,因为我正在找到一个所有统计数据都为空的模型。请看上面的版本。谢谢!它是什么意思df.index=df.index.swaplevel(0,1)?自由度达到60-3对我来说很奇怪。在典型的固定效应面板数据模型中,它转到N(T-1)-K.swaplevel
-它将第一级多指标与第二级一指标交换,因为您需要第一级多指标年份
。为什么需要有一个多索引?
Degrees of Freedom: model 161, resid 0
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.0000 nan nan nan nan nan
print df.head()
year fcode y x
30 1987 410523 -2.813411 0
48 1987 410538 0.970779 0
75 1987 410563 1.791759 0
81 1987 410565 3.044523 0
84 1987 410566 1.945910 0
#convert year to datetime
df['year'] = pd.to_datetime(df['year'], format='%Y')
#add column year to index
df=df.set_index('year', append=True)
#swap indexes
df.index = df.index.swaplevel(0,1)
print df.head()
fcode y x
year
1987-01-01 30 410523 -2.813411 0
48 410538 0.970779 0
75 410563 1.791759 0
81 410565 3.044523 0
84 410566 1.945910 0
model = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
print model
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x>
Number of Observations: 60
Number of Degrees of Freedom: 3
R-squared: 0.0013
Adj R-squared: -0.0338
Rmse: 1.4727
F-stat (1, 57): 0.0364, p-value: 0.8493
Degrees of Freedom: model 2, resid 57
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.1539 0.5704 0.27 0.7882 -0.9640 1.2719
---------------------------------End of Summary---------------------------------