使用R的事件研究设计中带有交互项的公式_R_Formula_Dummy Variable

使用R的事件研究设计中带有交互项的公式

使用R的事件研究设计中带有交互项的公式,r,formula,dummy-variable,R,Formula,Dummy Variable,我正在估算R中差异模型的“事件研究”规范。基本上，我们观察治疗和控制单元随时间的变化，并估算双向固定效应模型，其中包含每个时间段治疗的“效应”参数（省略一个周期，通常是治疗前的一个周期，作为参考周期）。我正在努力用R公式简洁地指定这个模型例如，下面是模型 library(lfe) library(tidyverse) library(dummies) N <- 100 df <- tibble( id = rep(1:N, 5), treat = id >

我正在估算R中差异模型的“事件研究”规范。基本上，我们观察治疗和控制单元随时间的变化，并估算双向固定效应模型，其中包含每个时间段治疗的“效应”参数（省略一个周期，通常是治疗前的一个周期，作为参考周期）。我正在努力用R公式简洁地指定这个模型

例如，下面是模型

library(lfe)
library(tidyverse)
library(dummies)

N <- 100

df <- tibble(
    id = rep(1:N, 5),
    treat = id >= ceiling(N / 2),
    time = rep(1:5, each=N),
    x = rnorm(5 * N)
)

# produce an outcome variable
df <- df %>% mutate(
    y = x - treat * (time == 5) + time + rnorm(5*N)
)

head(df)

# easily recover the parameters with the true model...
summary(felm(
    y ~ x + I(treat * (time == 5)) | id + time, data = df
))

这看起来不错，但会产生大量的NA，因为有几个系数会被单位和时间效应吸收。理想情况下，我可以指定没有这些系数的模型

# create dummy for each time period for treated units
tdum <- dummy(df$time)
df <- bind_cols(df, as.data.frame(tdum))
df <- df %>% mutate_at(vars(time1:time5), ~ . * treat)

# estimate model, manually omitting one dummy
summary(felm(
    y ~ x + time1 + time2 + time3 + time5 | id + time, data = df
))

在上述情况下，R不使用周期4作为参考周期，有时选择包括与未经治疗而非治疗的相互作用

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
x                    0.97198    0.05113  19.009  < 2e-16 ***
treatFALSE:timefac4       NA         NA      NA       NA    
treatTRUE:timefac4  -0.19607    0.28410  -0.690  0.49051    
treatFALSE:timefac1       NA         NA      NA       NA    
treatTRUE:timefac1  -0.07690    0.28572  -0.269  0.78796    
treatFALSE:timefac2       NA         NA      NA       NA    
treatTRUE:timefac2        NA         NA      NA       NA    
treatFALSE:timefac3  0.15525    0.28482   0.545  0.58601    
treatTRUE:timefac3        NA         NA      NA       NA    
treatFALSE:timefac5  0.97340    0.28420   3.425  0.00068 ***
treatTRUE:timefac5        NA         NA      NA       NA

系数：
估计标准误差t值Pr（>t）
x 0.97198 0.05113 19.009<2e-16***
treatFALSE:TimeFinder 4不适用
treatTRUE:TimeFinder AC4-0.19607 0.28410-0.690 0.49051
treatFALSE:TimeFinder AC1 NA
treatTRUE:TimeFinder AC1-0.07690.28572-0.269 0.78796
treatFALSE:TimeFinder AC2 NA
治疗正确：TimeFinder AC2 NA
treatFALSE:TimeFinder AC3 0.15525 0.28482 0.545 0.58601
treatTRUE:TimePac3不适用
treatFALSE:TimeFinder AC5 0.97340 0.28420 3.425 0.00068***
treatTRUE:TimePac5不适用

有没有一种方法可以指定此模型，而不必为每个时间段的治疗单元手动生成假人和交互

如果你知道斯塔塔，我基本上是在寻找一些简单的东西，比如：

areg y x i.treat#ib4.time，吸收（id）

（请注意，告诉Stata将变量视为分类变量（前缀为

）是多么简单，而无需对时间进行模拟，并指出周期4应为基期（前缀为

b4

）

您可以重新定义timefac，以便将未经处理的观测值编码为省略的时间类别

df %>% 
  mutate(time = ifelse(treat == 0, 4, time),
         timefac = factor(time, levels = c(4, 1, 2, 3, 5)))

然后，您可以在没有交互的情况下使用TimeFinder AC，并获得一个没有NAs的回归表

summary(felm(
  y ~ x + timefac | id + time, data = df
))

系数：
估计标准误差t值Pr（>t）
x 0.98548 0.05028 19.599<2e-16***
时间系数-0.01335 0.27553-0.048 0.961
时间2-0.10332 0.27661-0.374 0.709
时间系数0.24169 0.27575 0.876 0.381
时间fac5-1.163050.27557-4.221 3.03e-05***

这个想法来源于：

包

fixest

执行固定效果估计（如

lfe

），并包括处理交互的实用程序。功能

（或

交互

）就是您要寻找的

以下是一个例子，其中治疗与第5年相互作用，第5年退出：

library(fixest)
data(base_did)
est_did = feols(y ~ x1 + i(treat, period, 5) | id + period, base_did)
est_did
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080 
#> Fixed-effects: id: 108,  period: 10
#> Standard-errors: Clustered (id) 
#>                   Estimate Std. Error   t value  Pr(>|t|)    
#> x1                0.973490   0.045678 21.312000 < 2.2e-16 ***
#> treat:period::1  -1.403000   1.110300 -1.263700  0.206646    
#> treat:period::2  -1.247500   1.093100 -1.141200  0.254068    
#> treat:period::3  -0.273206   1.106900 -0.246813  0.805106    
#> treat:period::4  -1.795700   1.088000 -1.650500  0.099166 .  
#> treat:period::6   0.784452   1.028400  0.762798  0.445773    
#> treat:period::7   3.598900   1.101600  3.267100  0.001125 ** 
#> treat:period::8   3.811800   1.247500  3.055500  0.002309 ** 
#> treat:period::9   4.731400   1.097100  4.312600   1.8e-05 ***
#> treat:period::10  6.606200   1.120500  5.895800  5.17e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -2,984.58   Adj. R2: 0.48783

如果您不想使用

fixest

进行估算，您仍然可以使用函数

来创建交互。它的语法是

i（var、f、ref、drop、keep）

：它将变量

var

与

中每个值的虚拟变量交互。您可以选择

的哪些值要与参数

ref

、

drop

和

keep

drop保留，以及从

中删除值，

ref

与

drop相同，但引用显示在coefplot
中（而drop
中的值不显示在图形中）
下面是i
的一个示例：
head(with(base_did, i(treat, period, keep = 3:7)))
#>   treat:period::3 treat:period::4 treat:period::5 treat:period::6 treat:period::7
#> 1               0               0               0               0               0
#> 2               0               0               0               0               0
#> 3               1               0               0               0               0
#> 4               0               1               0               0               0
#> 5               0               0               1               0               0
#> 6               0               0               0               1               0
head(with(base_did, i(treat, period, drop = 3:7)))
#>   treat:period::1 treat:period::2 treat:period::8 treat:period::9 treat:period::10
#> 1               1               0               0               0                0
#> 2               0               1               0               0                0
#> 3               0               0               0               0                0
#> 4               0               0               0               0                0
#> 5               0               0               0               0                0
#> 6               0               0               0               0                0

您可以在fixest
上找到更多信息这看起来很棒！我喜欢基期和交互作用（在固定效应和感兴趣系数中）的明确性。
Coefficients:
          Estimate Std. Error t value Pr(>|t|)    
x          0.98548    0.05028  19.599  < 2e-16 ***
time_fac1 -0.01335    0.27553  -0.048    0.961    
time_fac2 -0.10332    0.27661  -0.374    0.709    
time_fac3  0.24169    0.27575   0.876    0.381    
time_fac5 -1.16305    0.27557  -4.221 3.03e-05 ***

library(fixest)
data(base_did)
est_did = feols(y ~ x1 + i(treat, period, 5) | id + period, base_did)
est_did
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080 
#> Fixed-effects: id: 108,  period: 10
#> Standard-errors: Clustered (id) 
#>                   Estimate Std. Error   t value  Pr(>|t|)    
#> x1                0.973490   0.045678 21.312000 < 2.2e-16 ***
#> treat:period::1  -1.403000   1.110300 -1.263700  0.206646    
#> treat:period::2  -1.247500   1.093100 -1.141200  0.254068    
#> treat:period::3  -0.273206   1.106900 -0.246813  0.805106    
#> treat:period::4  -1.795700   1.088000 -1.650500  0.099166 .  
#> treat:period::6   0.784452   1.028400  0.762798  0.445773    
#> treat:period::7   3.598900   1.101600  3.267100  0.001125 ** 
#> treat:period::8   3.811800   1.247500  3.055500  0.002309 ** 
#> treat:period::9   4.731400   1.097100  4.312600   1.8e-05 ***
#> treat:period::10  6.606200   1.120500  5.895800  5.17e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -2,984.58   Adj. R2: 0.48783 

coefplot(est_did)

head(with(base_did, i(treat, period, keep = 3:7)))
#>   treat:period::3 treat:period::4 treat:period::5 treat:period::6 treat:period::7
#> 1               0               0               0               0               0
#> 2               0               0               0               0               0
#> 3               1               0               0               0               0
#> 4               0               1               0               0               0
#> 5               0               0               1               0               0
#> 6               0               0               0               1               0
head(with(base_did, i(treat, period, drop = 3:7)))
#>   treat:period::1 treat:period::2 treat:period::8 treat:period::9 treat:period::10
#> 1               1               0               0               0                0
#> 2               0               1               0               0                0
#> 3               0               0               0               0                0
#> 4               0               0               0               0                0
#> 5               0               0               0               0                0
#> 6               0               0               0               0                0