如何计算不同组中的嵌套类别及其在R中的平均值

如何计算不同组中的嵌套类别及其在R中的平均值,r,dplyr,R,Dplyr,我们如何使用dplyr或其他有用的库从下面的示例计算以下内容: 每个州的学校总数 每所学校的学生总数 按性别分列的每所学校学生总数 按性别和类型分列的每所学校的学生总数 按性别划分的项目1和项目3的平均值 每个州按性别划分的第1项和第3项的平均值 非常感谢。TL/DR:在询问之前,你应该做一些研究。 你知道,在Stackoverflow上我们试着互相帮助。在请求帮助之前,我们通常会进行彻底的研究。在您的情况下,您可以阅读dplyr或tidyverse文档。我知道这有时很无聊,但总比从随机用户那里

我们如何使用dplyr或其他有用的库从下面的示例计算以下内容:

  • 每个州的学校总数
  • 每所学校的学生总数
  • 按性别分列的每所学校学生总数
  • 按性别和类型分列的每所学校的学生总数
  • 按性别划分的项目1和项目3的平均值
  • 每个州按性别划分的第1项和第3项的平均值

  • 非常感谢。

    TL/DR:在询问之前,你应该做一些研究。

    你知道,在Stackoverflow上我们试着互相帮助。在请求帮助之前,我们通常会进行彻底的研究。在您的情况下,您可以阅读
    dplyr
    tidyverse
    文档。我知道这有时很无聊,但总比从随机用户那里得到答案要好

    研究
    分组依据
    通过从RStudio控制台请求功能总结
    (例如,
    ?分组依据

    #1各州的学校总数
    按州划分的学校百分比
    按(州)分组%>%
    摘要(编号=n())
    
    在示例数据集中,您有唯一的学校名称。这就是为什么结果可能会令人困惑和毫无意义

    # 2 total number of students
    
    students <- df %>%
      group_by(schools) %>%
      summarise(students= n())
    
    # by gender
    
    students_gender <- df %>%
      group_by(Gender) %>%
      summarise(stud_gend = n())
    
    # by gender and type
    
    stud_gend_type <- df %>%
      group_by(Gender, type) %>%
      summarise(studs = n())
    
    #2学生总数
    学生%
    组别(学校)%>%
    总结(学生=n()
    #按性别
    学生性别%
    按性别划分的组别%>%
    总结(stud\u gend=n()
    #按性别和类型分列
    螺柱性别类型%
    按(性别、类型)分组%>%
    总结(螺柱=n()
    

    如你所见,原理非常简单。所以,我把最后两项任务留给您自己完成。

    在下面的代码中,我已经完成了部分任务的基本内容。您从dplyr库中获取的主要功能是groupby()summary()

    library(dplyr)
    
    ID = 1:50
    states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
    schools = randomNames::randomNames(50) ## 5 first last names separated by a space
    Gender = rep(c("male", "female"),times = c(18,32))
    type = rep(c("private", "public"),times = c(20,30))
    item1 = rnorm(50, mean=25, sd=5)
    item2 = rnorm(50, mean=30, sd=5)
    item3 = rnorm(50, mean=15, sd=5)
    df = data.frame(ID, states, schools, Gender, type, item1, item2, item3)
    
    head(df)
    
    # total number of schools by each state,
    
    df %>% 
      group_by(states) %>% 
      summarise(number = n())
    
    # total number of students by each school,
    # total number of students by each school by Gender,
    # total number of students by each school by Gender and type,
    
    df %>% 
      group_by(schools,Gender,type) %>% 
      summarise(number = n())
    
    # mean of item1 and item3 by Gender,
    # mean of item1 and item3 by Gender for each state,
    
    df %>% 
      group_by(Gender,states) %>% 
      summarise(item1 = mean(item1),
                item2 = mean(item2))
    

    为了不在这些问题上得到否定的观点,我会做一些尝试。您标记了“dplyr”,表示您知道在哪里查找。因此,这可能会让人觉得是试图让别人回答你的练习。非常感谢!
    # 2 total number of students
    
    students <- df %>%
      group_by(schools) %>%
      summarise(students= n())
    
    # by gender
    
    students_gender <- df %>%
      group_by(Gender) %>%
      summarise(stud_gend = n())
    
    # by gender and type
    
    stud_gend_type <- df %>%
      group_by(Gender, type) %>%
      summarise(studs = n())
    
    library(dplyr)
    
    ID = 1:50
    states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
    schools = randomNames::randomNames(50) ## 5 first last names separated by a space
    Gender = rep(c("male", "female"),times = c(18,32))
    type = rep(c("private", "public"),times = c(20,30))
    item1 = rnorm(50, mean=25, sd=5)
    item2 = rnorm(50, mean=30, sd=5)
    item3 = rnorm(50, mean=15, sd=5)
    df = data.frame(ID, states, schools, Gender, type, item1, item2, item3)
    
    head(df)
    
    # total number of schools by each state,
    
    df %>% 
      group_by(states) %>% 
      summarise(number = n())
    
    # total number of students by each school,
    # total number of students by each school by Gender,
    # total number of students by each school by Gender and type,
    
    df %>% 
      group_by(schools,Gender,type) %>% 
      summarise(number = n())
    
    # mean of item1 and item3 by Gender,
    # mean of item1 and item3 by Gender for each state,
    
    df %>% 
      group_by(Gender,states) %>% 
      summarise(item1 = mean(item1),
                item2 = mean(item2))