R 面板数据的双群集标准错误_R_Regression_Standard Error_Panel Data_Plm

R 面板数据的双群集标准错误

R 面板数据的双群集标准错误,r,regression,standard-error,panel-data,plm,R,Regression,Standard Error,Panel Data,Plm,我在R（时间和横截面）中有一个面板数据集，我想计算按二维聚类的标准误差，因为我的残差是双向相关的。在谷歌上搜索，我找到了一个可以实现这一点的功能。这似乎有点特别，所以我想知道是否有一个包已经过测试，并做到这一点我知道sandwich会产生HAC标准错误，但它不会进行双重聚类（即沿二维）。Arai的函数可用于聚类标准错误。他有另一个多维度聚类的版本： mcl <- function(dat,fm, cluster1, cluster2){ attach(dat, war

我在R（时间和横截面）中有一个面板数据集，我想计算按二维聚类的标准误差，因为我的残差是双向相关的。在谷歌上搜索，我找到了一个可以实现这一点的功能。这似乎有点特别，所以我想知道是否有一个包已经过测试，并做到这一点

我知道

sandwich

会产生HAC标准错误，但它不会进行双重聚类（即沿二维）。Arai的函数可用于聚类标准错误。他有另一个多维度聚类的版本：

mcl <- function(dat,fm, cluster1, cluster2){
          attach(dat, warn.conflicts = F)
          library(sandwich);library(lmtest)
          cluster12 = paste(cluster1,cluster2, sep="")
          M1  <- length(unique(cluster1))
          M2  <- length(unique(cluster2))   
          M12 <- length(unique(cluster12))
          N   <- length(cluster1)          
          K   <- fm$rank             
          dfc1  <- (M1/(M1-1))*((N-1)/(N-K))  
          dfc2  <- (M2/(M2-1))*((N-1)/(N-K))  
          dfc12 <- (M12/(M12-1))*((N-1)/(N-K))  
          u1j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster1,  sum)) 
          u2j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster2,  sum)) 
          u12j  <- apply(estfun(fm), 2, function(x) tapply(x, cluster12, sum)) 
          vc1   <-  dfc1*sandwich(fm, meat=crossprod(u1j)/N )
          vc2   <-  dfc2*sandwich(fm, meat=crossprod(u2j)/N )
          vc12  <- dfc12*sandwich(fm, meat=crossprod(u12j)/N)
          vcovMCL <- vc1 + vc2 - vc12
          coeftest(fm, vcovMCL)}

mclFrank Harrell的包rms
（以前被命名为Design
）有一个我在聚类时经常使用的功能：robcov

例如，请参见《robcov》的这一部分
cluster: a variable indicating groupings. ‘cluster’ may be any type of
      vector (factor, character, integer).  NAs are not allowed.
      Unique values of ‘cluster’ indicate possibly correlated
      groupings of observations. Note the data used in the fit and
      stored in ‘fit$x’ and ‘fit$y’ may have had observations
      containing missing values deleted. It is assumed that if any
      NAs were removed during the original model fitting, an
      ‘naresid’ function exists to restore NAs so that the rows of
      the score matrix coincide with ‘cluster’. If ‘cluster’ is
      omitted, it defaults to the integers 1,2,...,n to obtain the
      "sandwich" robust covariance matrix estimate.

对于面板回归，plm包可以沿二维估计聚集SEs
使用：
因此，现在您可以获得群集SEs：
##Clustered by *group*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *time*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *group* and *time*
> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *group*
> coeftest(fpm.tr, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

这是一个老问题。但鉴于人们似乎仍在使用它，我想我应该提供一些在R中实现多路集群的现代方法：
选项1（最快）：
库（fixest）
nlswork=haven:：read_dta（“http://www.stata-press.com/data/r14/nlswork.dta")
est_feols=feols（ln_工资~年龄|种族+年份，聚类=~种族+年份，数据=nlswork）
埃斯图尔
##fixest的一个重要特性是：我们可以即时计算其他
##VCOV矩阵/SE随summary.fixest（）一起运行。不需要重新运行
##模型！
总结（est_feols，se=‘标准’###SEs）
总结（est_feols，se=‘异性恋’）35;#稳健SEs
摘要（est_feols，se='twoway'）##diff语法，但与原始模型相同
总结（est_feols，cluster=c（‘种族’、‘年份’）35;#同上
摘要（est_feols，集群=~race^ year）##交互集群变量
摘要（est_feols，cluster=~race+year+idcode）##添加第三个集群变量（不在原始模型调用中）
##等等。

选项2（快速）：
库（lfe）
##注意，第三个“| 0”槽表示我们没有使用IV
est|felm=felm（ln|U工资~年龄|种族+年份| 0 |种族+年份，数据=nlswork）
总结（est_felm）

选项3（速度较慢，但灵活）：
库（三明治）
图书馆（lmtest）
est_三明治=lm（ln_工资~年龄+因素（种族）+因素（年份），数据=nlswork）
系数（est_三明治，vcov=vcovCL，集群=~种族+年份）

基准
啊，还有，只是为了说明速度的问题。下面是三种不同方法的基准（使用两个固定的FEs和双向集群）
est_feols=function（）feols（ln_wage~age |种族+年份，cluster=~race+年份，data=nlswork）
est|felm=function（）felm（ln|u工资~年龄|种族+年份| 0 |种族+年份，数据=nlswork）
est_standwich=function（）{coeftest（lm（ln_工资~年龄+因素（种族）+因素（年份），数据=nlswork），
vcov=vcovCL，集群=~种族+年份）}
微基准（est_feols（），est_felm（），est_standwich（），时间=3）
#>单位：毫秒
#>expr最小lq平均uq最大neval cld
#>est_feols（）11.94122 11.96158 12.55835 11.98193 12.86692 13.75191 3 a
#>est_felm（）87.18064 95.89905 100.69589 104.61746 107.45352 110.28957 3 b
#>美国标准时间176.43502188.50271191.48425194.53656197.58886 3 c
不幸的是robcov
仅适用于ols
对象，而不适用于lm
对象。您知道一个类似的功能适用于更主流的lm？
fpm.tr <- plm(y ~ x, test, model='pooling', index=c('firmid'))

##Clustered by *group*
> coeftest(fpm.tr, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

library("lmtest")
library("multiwayvcov")

data(petersen)
m1 <- lm(y ~ x, data = petersen)

coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid", "year")]))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.029680   0.065066  0.4561   0.6483    
## x           1.034833   0.053561 19.3206   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1