如何计算不同组中的嵌套类别及其在R中的平均值
我们如何使用dplyr或其他有用的库从下面的示例计算以下内容:如何计算不同组中的嵌套类别及其在R中的平均值,r,dplyr,R,Dplyr,我们如何使用dplyr或其他有用的库从下面的示例计算以下内容: 每个州的学校总数 每所学校的学生总数 按性别分列的每所学校学生总数 按性别和类型分列的每所学校的学生总数 按性别划分的项目1和项目3的平均值 每个州按性别划分的第1项和第3项的平均值 非常感谢。TL/DR:在询问之前,你应该做一些研究。 你知道,在Stackoverflow上我们试着互相帮助。在请求帮助之前,我们通常会进行彻底的研究。在您的情况下,您可以阅读dplyr或tidyverse文档。我知道这有时很无聊,但总比从随机用户那里
非常感谢。TL/DR:在询问之前,你应该做一些研究。 你知道,在Stackoverflow上我们试着互相帮助。在请求帮助之前,我们通常会进行彻底的研究。在您的情况下,您可以阅读
dplyr
或tidyverse
文档。我知道这有时很无聊,但总比从随机用户那里得到答案要好
研究分组依据
和通过从RStudio控制台请求功能总结
(例如,?分组依据
)
#1各州的学校总数
按州划分的学校百分比
按(州)分组%>%
摘要(编号=n())
在示例数据集中,您有唯一的学校名称。这就是为什么结果可能会令人困惑和毫无意义
# 2 total number of students
students <- df %>%
group_by(schools) %>%
summarise(students= n())
# by gender
students_gender <- df %>%
group_by(Gender) %>%
summarise(stud_gend = n())
# by gender and type
stud_gend_type <- df %>%
group_by(Gender, type) %>%
summarise(studs = n())
#2学生总数
学生%
组别(学校)%>%
总结(学生=n()
#按性别
学生性别%
按性别划分的组别%>%
总结(stud\u gend=n()
#按性别和类型分列
螺柱性别类型%
按(性别、类型)分组%>%
总结(螺柱=n()
如你所见,原理非常简单。所以,我把最后两项任务留给您自己完成。在下面的代码中,我已经完成了部分任务的基本内容。您从dplyr库中获取的主要功能是groupby()和summary():
library(dplyr)
ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
Gender = rep(c("male", "female"),times = c(18,32))
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, Gender, type, item1, item2, item3)
head(df)
# total number of schools by each state,
df %>%
group_by(states) %>%
summarise(number = n())
# total number of students by each school,
# total number of students by each school by Gender,
# total number of students by each school by Gender and type,
df %>%
group_by(schools,Gender,type) %>%
summarise(number = n())
# mean of item1 and item3 by Gender,
# mean of item1 and item3 by Gender for each state,
df %>%
group_by(Gender,states) %>%
summarise(item1 = mean(item1),
item2 = mean(item2))
为了不在这些问题上得到否定的观点,我会做一些尝试。您标记了“dplyr”,表示您知道在哪里查找。因此,这可能会让人觉得是试图让别人回答你的练习。非常感谢!
# 2 total number of students
students <- df %>%
group_by(schools) %>%
summarise(students= n())
# by gender
students_gender <- df %>%
group_by(Gender) %>%
summarise(stud_gend = n())
# by gender and type
stud_gend_type <- df %>%
group_by(Gender, type) %>%
summarise(studs = n())
library(dplyr)
ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
Gender = rep(c("male", "female"),times = c(18,32))
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, Gender, type, item1, item2, item3)
head(df)
# total number of schools by each state,
df %>%
group_by(states) %>%
summarise(number = n())
# total number of students by each school,
# total number of students by each school by Gender,
# total number of students by each school by Gender and type,
df %>%
group_by(schools,Gender,type) %>%
summarise(number = n())
# mean of item1 and item3 by Gender,
# mean of item1 and item3 by Gender for each state,
df %>%
group_by(Gender,states) %>%
summarise(item1 = mean(item1),
item2 = mean(item2))