
由于多重共线性,Stata没有删除变量(在回归中),我认为应该这样做,stata,linear-regression,Stata,Linear Regression,我正在运行一个简单的比赛时间与温度的回归,只是为了发展一些基本的直觉。我的数据集非常大,每次观察都是一个单位在给定的比赛中,在给定的年份内完成比赛的时间 首先,我在温度箱上运行一个非常简单的比赛时间回归 temp变量摘要: | Variable | Obs Mean Std. Dev Min Max ------------+-----------------------------------------




Variable    |   Obs     Mean      Std. Dev   Min    Max
avg_temp_scc|  8309434  54.3      9.4         0      89
Variable    |   Obs     Mean      Std. Dev   Min    Max
chiptime    |  8309434  267.5      59.6     122      1262

Variable    |   Obs     Mean      Std. Dev   Min    Max
avg_temp_scc|  8309434  54.3      9.4         0      89
Variable    |   Obs     Mean      Std. Dev   Min    Max
chiptime    |  8309434  267.5      59.6     122      1262


    egen temp_trial = cut(avg_temp_scc), at(0,10,20,30,40,50,60,70,80,90)
    reg chiptime i.temp_trial

  Source |       SS       df       MS              Number of obs = 8309434
---------+------------------------------           F(  8,8309425) =69509.83
   Model |  1.8525e+09     8   231557659           Prob > F      =  0.0000
Residual |  2.7681e+108309425  3331.29368           R-squared     =  0.0627
    -----+--------------------------------           Adj R-squared =  0.0627
   Total |  2.9534e+108309433  3554.22521           Root MSE      =  57.717

     chiptime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    temp_trial |
           10  |  -26.63549   2.673903    -9.96   0.000    -31.87625   -21.39474
           20  |   10.23883   1.796236     5.70   0.000      6.71827    13.75939
           30  |   -16.1049   1.678432    -9.60   0.000    -19.39457   -12.81523
           40  |  -13.97918   1.675669    -8.34   0.000    -17.26343   -10.69493
           50  |  -10.18371   1.675546    -6.08   0.000    -13.46772   -6.899695
           60  |  -.6865365   1.675901    -0.41   0.682    -3.971243     2.59817
           70  |   44.42869   1.676883    26.49   0.000     41.14206    47.71532
           80  |   23.63064   1.766566    13.38   0.000     20.16824    27.09305
         _cons |   273.1366   1.675256   163.04   0.000     269.8531      276.42


    gen temp0 = 1 if temp_trial==0
    replace temp0 = 0 if temp_trial!=0

    gen temp1 = 1 if temp_trial == 10
    replace temp1 = 0 if temp_trial != 10

    gen temp2 = 1 if temp_trial==20
    replace temp2 = 0 if temp_trial!=20

    gen temp3 = 1 if temp_trial==30
    replace temp3 = 0 if temp_trial!=30

    gen temp4=1 if temp_trial==40
    replace temp4=0 if temp_trial!=40

    gen temp5=1 if temp_trial==50
    replace temp5=0 if temp_trial!=50

    gen temp6=1 if temp_trial==60
    replace temp6=0 if temp_trial!=60

    gen temp7=1 if temp_trial==70
    replace temp7=0 if temp_trial!=70

    gen temp8=1 if temp_trial==80
    replace temp8=0 if temp_trial!=80

    reg chiptime temp0 temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8

     Source |       SS       df       MS              Number of obs = 8309434
   ---------+------------------------------           F(  9,8309424) =61786.51
      Model |  1.8525e+09     9   205829030           Prob > F      =  0.0000
   Residual |  2.7681e+108309424  3331.29408           R-squared     =  0.0627
    --------+------------------------------           Adj R-squared =  0.0627
      Total |  2.9534e+108309433  3554.22521           Root MSE      =  57.717

chiptime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
   temp0 |  -54.13245   6050.204    -0.01   0.993    -11912.32    11804.05
   temp1 |  -80.76794   6050.204    -0.01   0.989    -11938.95    11777.42
   temp2 |  -43.89362   6050.203    -0.01   0.994    -11902.08    11814.29
   temp3 |  -70.23735   6050.203    -0.01   0.991    -11928.42    11787.94
   temp4 |  -68.11162   6050.203    -0.01   0.991    -11926.29    11790.07
   temp5 |  -64.31615   6050.203    -0.01   0.992     -11922.5    11793.87
   temp6 |  -54.81898   6050.203    -0.01   0.993       -11913    11803.36
   temp7 |  -9.703755   6050.203    -0.00   0.999    -11867.89    11848.48
   temp8 |   -30.5018   6050.203    -0.01   0.996    -11888.68    11827.68
   _cons |    327.269   6050.203     0.05   0.957    -11530.91    12185.45

编辑: 以下是数据和do文件的dropbox链接: 它只包含考虑中的两个变量。文件大小为129MB。我在链接上还有一张我的输出图片。



. clear all

. set obs 8309434
number of observations (_N) was 0, now 8,309,434

. set seed 1

. gen avg_temp_scc = floor(90*uniform())

. egen temp_trial = cut(avg_temp_scc), at(0,10,20,30,40,50,60,70,80,90)

. gen chiptime = rnormal()

. reg chiptime i.temp_trial

      Source |       SS           df       MS      Number of obs   = 8,309,434
-------------+----------------------------------   F(8, 8309425)   =      0.88
       Model |  7.07729775         8  .884662219   Prob > F        =    0.5282
    Residual |   8308356.5 8,309,425  .999871411   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =   -0.0000
       Total |  8308363.58 8,309,433    .9998713   Root MSE        =    .99994

    chiptime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
  temp_trial |
         10  |   .0010732   .0014715     0.73   0.466    -.0018109    .0039573
         20  |   .0003255   .0014713     0.22   0.825    -.0025581    .0032092
         30  |   .0017061   .0014713     1.16   0.246    -.0011776    .0045897
         40  |   .0003128   .0014717     0.21   0.832    -.0025718    .0031973
         50  |   .0007142   .0014715     0.49   0.627    -.0021699    .0035983
         60  |   .0021693   .0014716     1.47   0.140    -.0007149    .0050535
         70  |  -.0008265   .0014715    -0.56   0.574    -.0037107    .0020577
         80  |  -.0005001   .0014714    -0.34   0.734    -.0033839    .0023837
       _cons |  -.0006364   .0010403    -0.61   0.541    -.0026753    .0014025

. * "qui tab temp_trial, gen(temp)" is more convenient than "forv ..."
. forv k = 0/8 {
  2. gen temp`k' = temp_trial==`k'0
  3. }

. reg chiptime temp0-temp8
note: temp6 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   = 8,309,434
-------------+----------------------------------   F(8, 8309425)   =      0.88
       Model |  7.07729775         8  .884662219   Prob > F        =    0.5282
    Residual |   8308356.5 8,309,425  .999871411   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =   -0.0000
       Total |  8308363.58 8,309,433    .9998713   Root MSE        =    .99994

    chiptime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
       temp0 |  -.0021693   .0014716    -1.47   0.140    -.0050535    .0007149
       temp1 |  -.0010961   .0014719    -0.74   0.456     -.003981    .0017888
       temp2 |  -.0018438   .0014717    -1.25   0.210    -.0047282    .0010407
       temp3 |  -.0004633   .0014717    -0.31   0.753    -.0033477    .0024211
       temp4 |  -.0018566   .0014721    -1.26   0.207    -.0047419    .0010287
       temp5 |  -.0014551   .0014719    -0.99   0.323      -.00434    .0014298
       temp6 |          0  (omitted)
       temp7 |  -.0029958   .0014719    -2.04   0.042    -.0058808   -.0001108
       temp8 |  -.0026694   .0014718    -1.81   0.070     -.005554    .0002152
       _cons |   .0015329   .0010408     1.47   0.141    -.0005071    .0035729




谢谢你核实这一点。你能详细说明这到底是什么吗 精度问题是什么?这只会在这方面造成问题吗 多重共线性问题?当我使用factor时会是这样吗 变量此精度问题可能导致我的估计错误










这个问题陈述将受益于一件需要检查的事情,即断言(temp0+temp1+…+temp8)==1,以确认您的箱子确实是详尽无遗的。@WilliamLisowski我做了断言,它没有产生任何错误。通过MVCE,你的意思是我应该创建一个数据集和一个do文件来重现错误并将它们附加到帖子上吗?@WilliamLisowski我有一个简单的清除do文件和数据文件,但数据文件是129MB。我真的不知道如何在这里分享它。你知道吗?当我使用此数据和do文件时,错误是可重复的。我使用Stata 15确认了您的结果。我注意到,在第二次回归之前,通过avg_temp_scc对数据进行排序解决了问题。我认为,计算相关矩阵的精度问题导致共线测试无法检测到它。我建议您联系Stata技术支持并向他们报告;他们将感谢MVCE。同时,您无意中提供了另一个支持使用因子变量而不是手动创建虚拟变量的参数。谢谢你发布这个问题。谢谢