R 自定义函数：允许未知数量的组进行操作在自定义函数中，如何避免在允许未知组数的情况下为每个组重复相同的代码？_R_Function_Loops_Ggplot2_Dry

R 自定义函数：允许未知数量的组进行操作在自定义函数中，如何避免在允许未知组数的情况下为每个组重复相同的代码？

r function loops

R 自定义函数：允许未知数量的组进行操作在自定义函数中，如何避免在允许未知组数的情况下为每个组重复相同的代码？,r,function,loops,ggplot2,dry,R,Function,Loops,Ggplot2,Dry,下面是一个简单的示例，但假设函数有大量的操作，比如为每个组计算不同的统计数据，并将它们粘贴到每个方面。抱歉，我发现很难制作一个更简单的函数来演示这个特定的挑战 test.function <- function(variable, group, data) { if(!require(dplyr)){install.packages("dplyr")} if(!require(ggplot2)){install.packages("ggplot2&qu

下面是一个简单的示例，但假设函数有大量的操作，比如为每个组计算不同的统计数据，并将它们粘贴到每个方面。抱歉，我发现很难制作一个更简单的函数来演示这个特定的挑战

test.function <- function(variable, group, data) {
  if(!require(dplyr)){install.packages("dplyr")}
  if(!require(ggplot2)){install.packages("ggplot2")}
  if(!require(ggrepel)){install.packages("ggrepel")}
  library(dplyr)
  library(ggplot2)
  require(ggrepel)
  data$variable <- data[,variable]
  data$group <- factor(data[,group])

  # Compute individual group stats
  data %>%
    filter(data$group==levels(data$group)[1]) %>%
    select(variable) %>%
    unlist %>%
    shapiro.test() -> shap
  shapiro.1 <- round(shap$p.value,3)
  data %>%
    filter(data$group==levels(data$group)[2]) %>%
    select(variable) %>%
    unlist %>%
    shapiro.test() -> shap
  shapiro.2 <- round(shap$p.value,3)
  data %>%
    filter(data$group==levels(data$group)[3]) %>%
    select(variable) %>%
    unlist %>%
    shapiro.test() -> shap
  shapiro.3 <- round(shap$p.value,3)

  # Make the stats dataframe for ggplot
  dat_text <- data.frame(
    group = levels(data$group),
    text = c(shapiro.1, shapiro.2, shapiro.3))

  # Make the plot
  ggplot(data, aes(x=variable, fill=group)) +
    geom_density() +
    facet_grid(group ~ .) +
    geom_text_repel(data = dat_text,
                    mapping = aes(x = Inf, 
                                  y = Inf, 
                                  label = text))
}

如果有两组，则不起作用

test.function("mpg", "vs", mtcars)

 Error in shapiro.test(.) : sample size must be between 3 and 5000

如果有三个以上的组，则不起作用

test.function("mpg", "cyl", mtcars)

test <- mtcars %>% mutate(new = rep(1:4, 8))
test.function("mpg", "new", test)

 Error in data.frame(group = levels(data$group), text = c(shapiro.1, shapiro.2,  : 
  arguments imply differing number of rows: 4, 3

test%变异（new=rep（1:4,8））
测试功能（“mpg”，“新”，测试）
data.frame（group=levels（data$group）中出错，text=c（shapiro.1，shapiro.2，：
参数表示不同的行数：4，3

程序员通常使用什么技巧来容纳这些函数中的任意数量的组？

我在评论中被要求解释这里的想法，因此我想我将扩展原始答案，它显示在下面的水平规则下面

主要的问题是如何对未知数量的组执行一些操作。有很多不同的方法可以做到这一点。在任何一种方法中，您都需要函数来识别组的数量并适应该数量。例如，您可以执行以下代码。在这里，我识别数据中的唯一组，我初始化所需结果，然后在所有组上循环。我没有使用此策略，因为for循环与

dplyr

代码相比感觉有点笨拙

un_group <- na.omit(unique(data[[group]]))
dat_text <- data.frame(group = un_group, 
                     text = NA)
for(i in 1:length(un_group)){
  tmp <- data[which(data[[group]] == ungroup[i]), ]
  dat_text$text[i] <- as.character(round(shaprio.test(tmp[[variable]])$p.value, 3))
}

在上面的函数中，我让函数返回一个包含多个不同变量的数据帧。对

summary

的单个调用将返回每个组变量的所有结果。这当然可以使用for循环或类似

sapply（）的东西

，但我喜欢

dplyr

代码的可读性更好一些。而且，根据您拥有的组数，

dplyr

代码的伸缩性比一些基本的R代码要好一些

我真的很喜欢在输出中反映输入（即输入变量名）——因此我想找到一种方法，在数据中创建名为

group

和

variable

的变量规范是一种方法，然后使用变量名构建公式是另一种方法。我最近刚刚遇到了

reformate（）

函数，它是一种比我以前使用的

paste（）

和

as.formula（）

组合更健壮的公式构建方法

这些就是我在回答问题时所想的事情

Brilliant…通过在

ggplot

调用中使用

aes_string（）

以及在其他地方使用双方括号，您摆脱了冗余的手动定义的

组

和

变量

call。最后，您使用

summary

将shapiro测试包括在

dplyr

通话中。很好！您认为在回答开始时提出这个问题时，您可以解释您的总体策略/心态吗？我认为从中学习可能会有所帮助！：）谢谢。我在上面的答案开头添加了一个解释。令人惊讶的解释，非常感谢。这是一个高质量的答案。我希望更多的人能看到它。编辑：不太多，但我在Twitter上分享了你的答案！@Rempyc感谢你的好话。很高兴这对我有帮助。

myfun <- function(x){
  s = shapiro.test(x)
  data.frame(p = s$p.value, stat=s$statistic, 
             mean = mean(x, na.rm=TRUE), 
             sd = sd(x, na.rm=TRUE), 
             skew = DescTools::Skew(x, na.rm=TRUE), 
             kurtosis = DescTools::Kurt(x, na.rm=TRUE))
  
}
mtcars %>% group_by(cyl) %>% summarise(myfun(mpg))
# # A tibble: 3 x 7
#     cyl     p  stat  mean    sd   skew kurtosis
# * <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
# 1     4 0.261 0.912  26.7  4.51  0.259   -1.65 
# 2     6 0.325 0.899  19.7  1.45 -0.158   -1.91 
# 3     8 0.323 0.932  15.1  2.56 -0.363   -0.566

test.function <- function(variable, group, data) {
  if(!require(dplyr)){install.packages("dplyr")}
  if(!require(ggplot2)){install.packages("ggplot2")}
  if(!require(ggrepel)){install.packages("ggrepel")}
  library(dplyr)
  library(ggplot2)
  require(ggrepel)

  # Compute individual group stats
  
  data[[group]] <- as.factor(data[[group]])
  
  dat_text <- data %>% group_by(.data[[group]]) %>% 
    summarise(text=shapiro.test(.data[[variable]])$p.value) %>% 
    mutate(text=as.character(round(text, 3)))
  
  gform <- reformulate(".", response=group)
  # Make the plot
  ggplot(data, aes_string(x=variable, fill=group)) +
    geom_density() +
    facet_grid(gform) +
    geom_text_repel(data = dat_text,
                    mapping = aes(x = Inf, 
                                  y = Inf, 
                                  label = text))
}
test.function("mpg", "vs", mtcars)

test.function("mpg", "cyl", mtcars)