在dplyr中将摘要函数转换为非标准评估

在dplyr中将摘要函数转换为非标准评估,r,dplyr,lazy-evaluation,R,Dplyr,Lazy Evaluation,考虑以下生成汇总表的交互式示例: library(dplyr) tg <- ToothGrowth ci_int <- 0.95 tg %>% group_by(supp, dose) %>% summarise(N = n(), mean = mean(len, na.rm = T), sd = sd(len, na.rm = T), se = sd / sqrt(N),

考虑以下生成汇总表的交互式示例:

library(dplyr)

tg <- ToothGrowth
ci_int <- 0.95

tg %>%
  group_by(supp, dose) %>%
  summarise(N = n(),
            mean = mean(len, na.rm = T),
            sd = sd(len, na.rm = T),
            se = sd / sqrt(N),
            ci = se * qt(ci_int / 2 + 0.50, N - 1))

#     supp  dose     N  mean       sd        se       ci
#   (fctr) (dbl) (int) (dbl)    (dbl)     (dbl)    (dbl)
# 1     OJ   0.5    10 13.23 4.459709 1.4102837 3.190283
# 2     OJ   1.0    10 22.70 3.910953 1.2367520 2.797727
# 3     OJ   2.0    10 26.06 2.655058 0.8396031 1.899314
# 4     VC   0.5    10  7.98 2.746634 0.8685620 1.964824
# 5     VC   1.0    10 16.77 2.515309 0.7954104 1.799343
# 6     VC   2.0    10 26.14 4.797731 1.5171757 3.432090
这将产生:

#     supp  dose     N  mean       sd
#   (fctr) (dbl) (int) (dbl)    (dbl)
# 1     OJ   0.5    10 13.23 4.459709
# 2     OJ   1.0    10 22.70 3.910953
# 3     OJ   2.0    10 26.06 2.655058
# 4     VC   0.5    10  7.98 2.746634
# 5     VC   1.0    10 16.77 2.515309
# 6     VC   2.0    10 26.14 4.797731
但是,这感觉不是很干吗?此外,我不知道如何实现
se
ci
而不会变得过于复杂/冗长?也许有一个更好的方法,或者应该把它分成几个功能


如何将上面的汇总表转换为一个函数,以便我可以将具有不同
度量值的
数据.frame
和具有
dplyr
的“spirit”的
groupvars
的任意组合传递给它?

我不确定这是否更具有“spirit”但您也可以尝试使用字符串来计算
平均值
sd
,等等:

summarySE <- function(df, measure, groupvars, conf.int = 0.95) {
  df %>% group_by_(.dots = groupvars)%>%
    summarise_(N="n()",
               mean = paste0("mean(",measure,", na.rm = T)"),
               sd = paste0("sd(",measure,", na.rm = T)"),
               se = "sd/sqrt(N)",
               ci = paste0("se * stats::qt(",conf.int," / 2 + 0.50, N - 1)"))
}

summarySE(tg, "len", c("supp", "dose"))

#    supp  dose     N  mean       sd        se       ci
#  (fctr) (dbl) (int) (dbl)    (dbl)     (dbl)    (dbl)
#1     OJ   0.5    10 13.23 4.459709 1.4102837 3.190283
#2     OJ   1.0    10 22.70 3.910953 1.2367520 2.797727
#3     OJ   2.0    10 26.06 2.655058 0.8396031 1.899314
#4     VC   0.5    10  7.98 2.746634 0.8685620 1.964824
#5     VC   1.0    10 16.77 2.515309 0.7954104 1.799343
#6     VC   2.0    10 26.14 4.797731 1.5171757 3.432090
summarySE%group\u by_uuu(.dots=groupvars)%>%
总结(N=“N()”,
平均值=0(“平均值(,度量值,,na.rm=T)”),
sd=0(“sd(,measure,,,na.rm=T)”),
se=“sd/sqrt(N)”,
ci=paste0(“se*stats::qt(“,conf.int,”/2+0.50,N-1)”)
}
总结(tg、len、c(“补充”、“剂量”))
#补充剂量N平均sd se ci
#(fctr)(dbl)(int)(dbl)(dbl)(dbl)(dbl)(dbl)
#1 OJ 0.5 10 13.23 4.459709 1.4102837 3.190283
#2 OJ 1.0 10 22.70 3.910953 1.2367520 2.797727
#3 OJ 2.0 10 26.06 2.655058 0.8396031 1.899314
#4 VC 0.5 10 7.98 2.746634 0.8685620 1.964824
#5 VC 1.0 10 16.77 2.515309 0.7954104 1.799343
#6 VC 2.0 10 26.14 4.797731 1.5171757 3.432090

我真的不太明白为什么SE和CI的计算比您已经在做的更复杂

我使用了
参数来捕获分组参数,因为这样使用起来似乎更容易一些

总的来说,我最终使用以下功能:

summarySE <- function(.data, measure, ..., conf.int = 0.95) {
  dots <- lazyeval::lazy_dots(...)
  measure <- lazyeval::lazy(measure)

  summary_dots <- list(
    N = ~ n(),
    mean = lazyeval::interp(~ mean(var, na.rm = T), var = measure),
    sd = lazyeval::interp(~ sd(var, na.rm = T), var = measure),
    se = ~ sd / sqrt(N),
    ci = ~ se * qt(conf.int / 2 + 0.50, N - 1))

  .data <- dplyr::group_by_(.data, .dots = dots)
  dplyr::summarise_(.data, .dots = summary_dots)
}

是的,我把
supp
dose
作为向量传递,
c(supp,dose)
——我总是害怕使用
——有没有一种快速的替代方法来传递它们作为
组=c(…)
?你摆脱了
懒惰的点。然后使用分组变量列表,并将其直接输入到
group\u by
。不过,它们将不得不被引用。例如,
summarySE(tg、len、list(~supp、~dose))
。无法理解如何将一个空名称列表惰性地捕获到一个惰性对象列表中。
summarySE <- function(.data, measure, ..., conf.int = 0.95) {
  dots <- lazyeval::lazy_dots(...)
  measure <- lazyeval::lazy(measure)

  summary_dots <- list(
    N = ~ n(),
    mean = lazyeval::interp(~ mean(var, na.rm = T), var = measure),
    sd = lazyeval::interp(~ sd(var, na.rm = T), var = measure),
    se = ~ sd / sqrt(N),
    ci = ~ se * qt(conf.int / 2 + 0.50, N - 1))

  .data <- dplyr::group_by_(.data, .dots = dots)
  dplyr::summarise_(.data, .dots = summary_dots)
}
summarySE(tg, len, supp, dose)

Source: local data frame [6 x 7]
Groups: supp [?]

    supp  dose     N  mean       sd        se       ci
  (fctr) (dbl) (int) (dbl)    (dbl)     (dbl)    (dbl)
1     OJ   0.5    10 13.23 4.459709 1.4102837 3.190283
2     OJ   1.0    10 22.70 3.910953 1.2367520 2.797727
3     OJ   2.0    10 26.06 2.655058 0.8396031 1.899314
4     VC   0.5    10  7.98 2.746634 0.8685620 1.964824
5     VC   1.0    10 16.77 2.515309 0.7954104 1.799343
6     VC   2.0    10 26.14 4.797731 1.5171757 3.432090