R 如何按组和数据帧格式计算平均值和sd?

R 如何按组和数据帧格式计算平均值和sd?,r,dplyr,tidyr,R,Dplyr,Tidyr,我的数据当前的格式为df1: outcome <- c("success", "failure", "success", "failure", "success", "failure") basketball <- c(10, 7, 7, 8, 9, 10) soccer <- c(8, 21, 30, 21, 6, 10) football <- c

我的数据当前的格式为df1:

outcome <- c("success", "failure", "success", "failure", "success", "failure")
basketball <- c(10, 7, 7, 8, 9, 10)
soccer <- c(8, 21, 30,  21, 6, 10)
football <- c(9,  2,  1, 3, 1, 5)

df1 <-  data.frame(outcome, basketball, soccer, football)

outcome我们可以用
pivot\u long
重塑为“long”格式,然后进行分组手术

library(dplyr)
library(tidyr)
df1 %>%
  pivot_longer(cols = basketball:football, names_to = 'symptom') %>% 
  group_by(outcome, symptom) %>%
  summarise(mean = mean(value), sd = sd(value), .groups = 'drop')
如果我们还需要策划

library(ggplot2)
df1 %>%
  pivot_longer(cols = basketball:football, names_to = 'symptom') %>% 
  group_by(outcome, symptom) %>% 
  summarise(mean = mean(value), sd = sd(value), .groups = 'drop') %>%
  ggplot(aes(x = outcome, y = mean, fill = symptom)) + 
    geom_bar(position = position_dodge(), stat = 'identity') + 
    geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
            width = .2, position = position_dodge(.9))

我们可以使用
pivot\u longer
将格式改为“long”,然后按操作分组

library(dplyr)
library(tidyr)
df1 %>%
  pivot_longer(cols = basketball:football, names_to = 'symptom') %>% 
  group_by(outcome, symptom) %>%
  summarise(mean = mean(value), sd = sd(value), .groups = 'drop')
如果我们还需要策划

library(ggplot2)
df1 %>%
  pivot_longer(cols = basketball:football, names_to = 'symptom') %>% 
  group_by(outcome, symptom) %>% 
  summarise(mean = mean(value), sd = sd(value), .groups = 'drop') %>%
  ggplot(aes(x = outcome, y = mean, fill = symptom)) + 
    geom_bar(position = position_dodge(), stat = 'identity') + 
    geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
            width = .2, position = position_dodge(.9))

是基于输入数据的“平均值”、“标准差”中的值是基于输入数据的“平均值”、“标准差”中的值谢谢运行。因此,在此场景中,您不再在
pivot\u
中定义
值。它是自动创建的吗?还有什么是`.groups='drop``的意思。提前谢谢你@TarJae在Op的预期中,是总结列“平均值”或“sd”,它不关心“值”列的名称。因此,我没有指定默认情况下将命名为“value”的
values\u to
,而“names\u to”被称为“症状”(默认情况下是“name”列),谢谢您。因此,在此场景中,您不再在
pivot\u
中定义
值。它是自动创建的吗?还有什么是`.groups='drop``的意思。提前谢谢你@TarJae在Op的预期中,是总结列“平均值”或“sd”,它不关心“值”列的名称。因此,我没有指定默认情况下将命名为“value”的
values\u to
,而“names\u to”被称为“症状”(默认情况下是“name”列)