四元变量R上的匹配

四元变量R上的匹配,r,for-loop,dplyr,matching,R,For Loop,Dplyr,Matching,我正在使用您可以使用以下代码生成的数据集: set.seed(922) dat<-data.frame("y" = c(rnorm(500,20,2),rnorm(500, 40,2),rnorm(500,60,2),rnorm(500,80,2)), "quaternary" = sample(LETTERS[1:4],2000,replace = T), "binary" = sample(c("0","1"),2000,

我正在使用您可以使用以下代码生成的数据集:

set.seed(922)
dat<-data.frame("y" = c(rnorm(500,20,2),rnorm(500, 40,2),rnorm(500,60,2),rnorm(500,80,2)),
                "quaternary" = sample(LETTERS[1:4],2000,replace = T),
                "binary" = sample(c("0","1"),2000,replace = T))
但我不确定如何在“y”的值上建立4级因子“第四纪”的匹配

我不确定Matchit软件包中是否有一种优雅的方法可以做到这一点,但我愿意接受任何关于如何堆叠方法以获得良好平衡样本的建议。任何帮助都会很棒

编辑:

好吧,我想我很接近了。您可以在for循环中利用dplyr。它有点低效,我仍然需要考虑使用它来创建一个平衡样本的含义,但是它越来越近了

首先,在dat框架中,创建四个新变量,并填充NAs:

dat$A_match<-NA
dat$B_match<-NA
dat$C_match<-NA
dat$D_match<-NA

dat$A_match请记住,MatchIt只生成适合估计ATT的匹配样本(对治疗对象的平均治疗效果)。通常,MatchIt会选择一个它认为“已治疗”的组,该组通常是标记为“1”的治疗级别。然后将一个或多个控制单元与每个处理单元匹配

对于多项式处理,您还需要确定您感兴趣的估计。同样,如果您对ATT感兴趣,您必须选择一组被视为“治疗组”,其他组被视为“控制组”(我更愿意将其称为“焦点”和“非焦点”)。重要的是,您的治疗效果评估仅适用于与焦点群体组成相似的人群

如果这是您想要的,您需要选择一个组作为焦点,然后执行三个单独的
matchit
调用,其中每个调用将非焦点组中的一个单元与焦点组中的单元进行匹配。协调小组保持不变。下面是一些我可能会使用的代码:

set.seed(922)
library(MatchIt)
dat<-data.frame("y" = c(rnorm(500,20,2),rnorm(500, 40,2),rnorm(500,60,2),rnorm(500,80,2)),
                "quaternary" = sample(LETTERS[1:4],2000,replace = T, prob = c(.1, .3, .3, .3)),
                "binary" = sample(c("0","1"),2000,replace = T))
focal <- "A"

dat$match.weights <- 1
for (lev in levels(dat$quaternary)) {
    if (lev != focal) {
        dat0 <- dat[dat$quaternary %in% c(focal, lev),]
        dat0$treat <- as.numeric(dat0$quaternary == focal)

        m.out <- matchit(treat ~ y, dat = dat0, replace = FALSE)
        dat$match.weights[dat$quaternary == lev] <- m.out$weights[dat0$treat == 0]
    }
}

library(cobalt)
bal.tab(quaternary ~ y, data = dat, weights = dat$match.weights, 
        method = "matching", focal = focal, un = TRUE)
#> Note: estimand and s.d.denom not specified; assuming ATT and treated.
#> Balance summary across all treatment pairs
#>      Type Max.Diff.Un Max.Diff.Adj
#> y Contin.      0.1134       0.0009
#> 
#> Sample sizes
#>             B   C   D   A
#> All       593 597 612 198
#> Matched   198 198 198 198
#> Unmatched 395 399 414   0
由reprex软件包(v0.2.1)于2018年10月13日创建

无论您使用何种策略,您都可以使用估计的匹配权重或ATT或ATE权重对治疗结果进行加权回归


[披露:我是cobalt和WeightIt软件包的作者。]

Wow。感谢您的全面解释和您共享的代码。WeighIt软件包看起来很有希望。
require(dplyr) #haha. Hey that rhymes

for(i in 1:dim(dat)[1]){
  dat_A_index<-dat%>%
    mutate(y = ifelse(quaternary=="A",y,0),
           abs = abs(dat[i,1]-y))%>%
    summarise(A_index = which.min(abs))
  dat$A_match[i]<-dat[dat_A_index$A_index,1]
  rm(dat_A_index)
  dat_B_index<-dat%>%
    mutate(y = ifelse(quaternary=="B",y,0),
           abs = abs(dat[i,1]-y))%>%
    summarise(B_index = which.min(abs))
  dat$B_match[i]<-dat[dat_B_index$B_index,1]
  rm(dat_B_index)
  dat_C_index<-dat%>%
    mutate(y = ifelse(quaternary=="C",y,0),
           abs = abs(dat[i,1]-y))%>%
    summarise(C_index = which.min(abs))
  dat$C_match[i]<-dat[dat_C_index$C_index,1]
  rm(dat_C_index)
  dat_D_index<-dat%>%
    mutate(y = ifelse(quaternary=="D",y,0),
           abs = abs(dat[i,1]-y))%>%
    summarise(D_index = which.min(abs))
  dat$D_match[i]<-dat[dat_D_index$D_index,1]
  rm(dat_D_index)
}
set.seed(922)
library(MatchIt)
dat<-data.frame("y" = c(rnorm(500,20,2),rnorm(500, 40,2),rnorm(500,60,2),rnorm(500,80,2)),
                "quaternary" = sample(LETTERS[1:4],2000,replace = T, prob = c(.1, .3, .3, .3)),
                "binary" = sample(c("0","1"),2000,replace = T))
focal <- "A"

dat$match.weights <- 1
for (lev in levels(dat$quaternary)) {
    if (lev != focal) {
        dat0 <- dat[dat$quaternary %in% c(focal, lev),]
        dat0$treat <- as.numeric(dat0$quaternary == focal)

        m.out <- matchit(treat ~ y, dat = dat0, replace = FALSE)
        dat$match.weights[dat$quaternary == lev] <- m.out$weights[dat0$treat == 0]
    }
}

library(cobalt)
bal.tab(quaternary ~ y, data = dat, weights = dat$match.weights, 
        method = "matching", focal = focal, un = TRUE)
#> Note: estimand and s.d.denom not specified; assuming ATT and treated.
#> Balance summary across all treatment pairs
#>      Type Max.Diff.Un Max.Diff.Adj
#> y Contin.      0.1134       0.0009
#> 
#> Sample sizes
#>             B   C   D   A
#> All       593 597 612 198
#> Matched   198 198 198 198
#> Unmatched 395 399 414   0
library(WeightIt)
#Weighting for the ATT with A as focal:
w.out.att <- weightit(quaternary ~ y, data = dat, estimand = "ATT", focal = "A")
#> Using multinomial logit regression.
dat$w.att <- w.out.att$weights

#Weighting for the ATE:
w.out.ate <- weightit(quaternary ~ y, data = dat, estimand = "ATE")
#> Using multinomial logit regression.
dat$w.ate <- w.out.ate$weights

bal.tab(quaternary ~ y, data = dat, weights = c("w.att", "w.ate"), 
        method = "weighting", estimand = c("ATT", "ATE"), un = TRUE)
#> Balance summary across all treatment pairs
#>      Type Max.Diff.Un Max.Diff.w.att Max.Diff.w.ate
#> y Contin.      0.1092         0.0055         0.0024
#> 
#> Effective sample sizes
#>             A       B       C       D
#> All   198.000 593.000 597.000 612.000
#> w.att 198.000 591.139 593.474 604.162
#> w.ate 196.947 592.822 596.993 611.107