Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于计算平均值、n、sd和标准误差的Dplyr函数_R_Dplyr_Nse - Fatal编程技术网

用于计算平均值、n、sd和标准误差的Dplyr函数

用于计算平均值、n、sd和标准误差的Dplyr函数,r,dplyr,nse,R,Dplyr,Nse,我发现自己一直在编写这段代码,以生成组平均值的标准错误(然后用于绘制置信区间) 不过,如果能用一行代码编写自己的函数来完成这项工作,那就太好了。我已经阅读了dplyr中关于非标准评估的小插曲。我有点明白,但我太笨了,一个人想不出来。有人能帮忙吗?谢谢 var1<-sample(c('red', 'green'), size=10, replace=T) var2<-rnorm(10, mean=5, sd=1) df<-data.frame(var1, var2) df %&g

我发现自己一直在编写这段代码,以生成组平均值的标准错误(然后用于绘制置信区间)

不过,如果能用一行代码编写自己的函数来完成这项工作,那就太好了。我已经阅读了dplyr中关于非标准评估的小插曲。我有点明白,但我太笨了,一个人想不出来。有人能帮忙吗?谢谢

var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
df<-data.frame(var1, var2)
df %>% 
group_by(var1) %>% 
summarize(avg=mean(var2), n=n(), sd=sd(var2), se=sd/sqrt(n))

var1您可以使用函数
enquo
显式命名函数调用中的变量:

my_fun <- function(x, cat_var, num_var){
  cat_var <- enquo(cat_var)
  num_var <- enquo(num_var)

  x %>%
    group_by(!!cat_var) %>%
    summarize(avg = mean(!!num_var), n = n(), 
              sd = sd(!!num_var), se = sd/sqrt(n))
}
另一种方法是使用
quos
允许函数捕获
group_by
语句的多个参数。看起来是这样的:

my_fun <- function(x, num_var){
  num_var <- enquo(num_var)

  x %>%
    summarize(avg = mean(!!num_var), n = n(), 
              sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
  group_by(var1) %>%
  my_fun(var2)
#first, build the new dataframe
var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
var3 <- sample(c("A", "B"), size = 10, replace = TRUE)
df<-data.frame(var1, var2, var3)

# using the first version `my_fun`, it would look like this
df %>%
  group_by(var1, var3) %>%
  my_fun(var2)

# A tibble: 4 x 6
# Groups:   var1 [?]
    var1   var3      avg     n        sd        se
  <fctr> <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green      A 5.248095     1       NaN       NaN
2  green      B 5.589881     2 0.7252621 0.5128378
3    red      A 5.364265     2 0.5748759 0.4064986
4    red      B 4.908226     5 1.1437186 0.5114865

# Now doing it with a new function `my_fun2`
my_fun2 <- function(x, num_var, ...){
  group_var <- quos(...)
  num_var <- enquo(num_var)

  x %>%
    group_by(!!!group_var) %>%
    summarize(avg = mean(!!num_var), n = n(), 
              sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
  my_fun2(var2, var1, var3)

# A tibble: 4 x 6
# Groups:   var1 [?]
    var1   var3      avg     n        sd        se
  <fctr> <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green      A 5.248095     1       NaN       NaN
2  green      B 5.589881     2 0.7252621 0.5128378
3    red      A 5.364265     2 0.5748759 0.4064986
4    red      B 4.908226     5 1.1437186 0.5114865
#首先,构建新的数据帧

你能展示一下你所做的吗?你在哪里卡住了?看看[nse]标签中的一些问题。好吧,我在博客帖子中玩弄了这段代码:
code
mean\u mpg=function(data,…,x){data%>%groupby.(dots=lazyeval::lazy\u dots(…)%>%summary(mean\u mpg=~mean(x))}mtcars%>%mean\u mpg(cyl,gear,mpg)
code
它返回了一个错误,而不是一个矢量。您可能应该注意,这只适用于
dplyr
的开发版本,而不是OP最可能使用的当前CRAN版本;我忘了我问过这个问题。但是,是否可以不在函数中包含分类分组变量?有时我用一个分组,有时用两个分组变量。我希望在自定义函数之外保持这种灵活性。但我不知道这是否可行。我添加了一个编辑,可以让你用两种不同的方式来完成
my_fun <- function(x, num_var){
  num_var <- enquo(num_var)

  x %>%
    summarize(avg = mean(!!num_var), n = n(), 
              sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
  group_by(var1) %>%
  my_fun(var2)
#first, build the new dataframe
var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
var3 <- sample(c("A", "B"), size = 10, replace = TRUE)
df<-data.frame(var1, var2, var3)

# using the first version `my_fun`, it would look like this
df %>%
  group_by(var1, var3) %>%
  my_fun(var2)

# A tibble: 4 x 6
# Groups:   var1 [?]
    var1   var3      avg     n        sd        se
  <fctr> <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green      A 5.248095     1       NaN       NaN
2  green      B 5.589881     2 0.7252621 0.5128378
3    red      A 5.364265     2 0.5748759 0.4064986
4    red      B 4.908226     5 1.1437186 0.5114865

# Now doing it with a new function `my_fun2`
my_fun2 <- function(x, num_var, ...){
  group_var <- quos(...)
  num_var <- enquo(num_var)

  x %>%
    group_by(!!!group_var) %>%
    summarize(avg = mean(!!num_var), n = n(), 
              sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
  my_fun2(var2, var1, var3)

# A tibble: 4 x 6
# Groups:   var1 [?]
    var1   var3      avg     n        sd        se
  <fctr> <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green      A 5.248095     1       NaN       NaN
2  green      B 5.589881     2 0.7252621 0.5128378
3    red      A 5.364265     2 0.5748759 0.4064986
4    red      B 4.908226     5 1.1437186 0.5114865