R 在数据帧中使用循环进行多次测试

R 在数据帧中使用循环进行多次测试,r,function,for-loop,R,Function,For Loop,我希望有一个通用函数,用以下示例数据对数据帧中的数据执行多个t.测试: dat <- data.frame(ID=c(1:100), DRUG= rep(c("D1","D2","D2","D3","D3","D3","D5","D1","D4","D2"),10), ADR=rep(c("A1","A2","A3","A6","A7","A8","A4","A2","A1","A2"),10),

我希望有一个通用函数,用以下示例数据对数据帧中的数据执行多个t.测试:

dat <- data.frame(ID=c(1:100),
                  DRUG= rep(c("D1","D2","D2","D3","D3","D3","D5","D1","D4","D2"),10),
                  ADR=rep(c("A1","A2","A3","A6","A7","A8","A4","A2","A1","A2"),10),
                  X= sample(1:250, 100, replace=F))

与每个编程问题一样,解决方案分为两个步骤:

  • 抽象出你的逻辑,使之具有一般性
  • 将抽象解决方案封装到可重用函数中
  • 您可以继续执行以下操作:

  • 对所有数据重复调用该函数
  • 然而,首先:t检验有时因数据不足而失败;因此,让我们替换
    t.test
    调用:

    t_test = function (x, y, ...) {
        tryCatch(t.test(x, y, ...)$p.value, error = function (err) NA)
    }
    
    然后,综合起来,这给了我们:

    library(dplyr) # Makes data manipulation easier.
    
    test_combination = function (data, id) {
        drug = data[id, ]$DRUG
        adr = data[id, ]$ADR
    
        match = filter(data, DRUG == drug, ADR == adr)$X
        mismatch1 = filter(data, DRUG != drug, ADR == adr)$X
        mismatch2 = filter(data, DRUG == drug, ADR != adr)$X
    
        list(pval1 = t_test(match, mismatch1), pval2 = t_test(match, mismatch2))
    }
    
    它测试一个单一的组合。现在我们测试所有这些:

    result = lapply(dat$ID, test_combination, data = dat) %>%
        bind_rows() %>%
        bind_cols(dat, .) %>%
        select(-X)
    
    或者,使用更像dplyr的方法(但在我看来有些模糊):

    result = dat %>%
        rowwise() %>%
        do(bind_rows(test_combination(dat, .$ID))) %>%
        bind_cols(dat, .) %>%
        select(-X)
    
    请注意,此代码如何不对循环使用显式
    。这就是在R中处理数据的方式:将函数应用于表或列表中的项,而不是手动迭代

    请注意,从统计角度来看,上述情况非常值得怀疑。至少你需要严格执行

    library(dplyr) # Makes data manipulation easier.
    
    test_combination = function (data, id) {
        drug = data[id, ]$DRUG
        adr = data[id, ]$ADR
    
        match = filter(data, DRUG == drug, ADR == adr)$X
        mismatch1 = filter(data, DRUG != drug, ADR == adr)$X
        mismatch2 = filter(data, DRUG == drug, ADR != adr)$X
    
        list(pval1 = t_test(match, mismatch1), pval2 = t_test(match, mismatch2))
    }
    
    result = lapply(dat$ID, test_combination, data = dat) %>%
        bind_rows() %>%
        bind_cols(dat, .) %>%
        select(-X)
    
    result = dat %>%
        rowwise() %>%
        do(bind_rows(test_combination(dat, .$ID))) %>%
        bind_cols(dat, .) %>%
        select(-X)