如何基于R中的条件创建新列
我有三个班的学生A、B和C。我的可复制数据集如下所示:如何基于R中的条件创建新列,r,if-statement,R,If Statement,我有三个班的学生A、B和C。我的可复制数据集如下所示: data <- data.frame(Student_ID =c(1,1,1,2,2,3,3,3,3,3,4,4,4,5,6,6,7,7,7,8,8), Years_Attended = c(1991,1992,1995,1992,1993,1991,1992,1993,1994,1995,1993,1994,1995,1995,1993,1995,1990,1995,2000,1995,1996
data <- data.frame(Student_ID =c(1,1,1,2,2,3,3,3,3,3,4,4,4,5,6,6,7,7,7,8,8),
Years_Attended = c(1991,1992,1995,1992,1993,1991,1992,1993,1994,1995,1993,1994,1995,1995,1993,1995,1990,1995,2000,1995,1996),
Class = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C"))
Intended_output <- data.frame(Student_ID = c(1,1,1,2,2,3,3,3,3,3,4,4,4,5,6,6,7,7,7,8,8),
Years_Attended = c(1991,1992,1995,1992,1993,1991,1992,1993,1994,1995,1993,1994,1995,1995,1993,1995,1990,1995,2000,1995,1996),
Class = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C"),
New_Student = c("No","No","No","Yes","Yes","No","No","No","No","No","No","No","No","Yes","No","No","No","No","No","Yes","Yes"))
数据对于每个班级
查找一年中的最短时间,将此列作为新列添加到您的数据集中,并针对每个学生检查他们是否参加了该年的课程
library(dplyr)
data %>%
group_by(Class) %>%
summarise(min_year = min(Years_Attended)) %>%
left_join(data, by = 'Class') %>%
group_by(Class, Student_ID) %>%
mutate(New_Student = if(any(Years_Attended == first(min_year)))'No' else 'Yes')
# Class min_year Student_ID Years_Attended New_Student
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 A 1991 1 1991 No
# 2 A 1991 1 1992 No
# 3 A 1991 1 1995 No
# 4 A 1991 2 1992 Yes
# 5 A 1991 2 1993 Yes
# 6 A 1991 3 1991 No
# 7 A 1991 3 1992 No
# 8 A 1991 3 1993 No
# 9 A 1991 3 1994 No
#10 A 1991 3 1995 No
# … with 11 more rows
库(dplyr)
数据%>%
分组依据(类别)%>%
总结(最低年数=最低年数))%>%
左联合(数据,由='类')%>%
分组依据(班级,学生ID)%>%
变异(新学生=如果(任何(就读年数==第一年(最小年))“否”或“是”)
#班级最低年级学生ID年级新生
#
#1 A 1991 1 1991第号
#2 A 1991 1 1992号
#3 A 1991 1 1995第号
#4 A 1991 2 1992是的
#5 A 1991 2 1993是的
#6 A 1991 3 1991号
#7 A 1991 3 1992号
#8 A 1991 3 1993号
#9 A 1991 3 1994号
#10 A 1991 3 1995号
#…还有11排
以下是一个数据。表
方法:
库(data.table)
dt=as.data.table(数据)
dt[,新生:={
班级最低=最低(就读年数)
ave(参加年数,学生ID,乐趣=函数(x)如果其他(课程分钟!=分钟(x),‘是’、‘否’)
},
by=类]
这其实是一个复杂的问题。我们需要找出每个班级的最低学年,然后在每个班级中找出哪些学生的起始学年不等于最低学年。错误是由括号输入错误造成的。它应该是…=min(数据$Years\u Attended),“是”、“否”)
在min通话后加上括号非常感谢大家来帮助我
library(dplyr)
data %>%
group_by(Class) %>%
summarise(min_year = min(Years_Attended)) %>%
left_join(data, by = 'Class') %>%
group_by(Class, Student_ID) %>%
mutate(New_Student = if(any(Years_Attended == first(min_year)))'No' else 'Yes')
# Class min_year Student_ID Years_Attended New_Student
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 A 1991 1 1991 No
# 2 A 1991 1 1992 No
# 3 A 1991 1 1995 No
# 4 A 1991 2 1992 Yes
# 5 A 1991 2 1993 Yes
# 6 A 1991 3 1991 No
# 7 A 1991 3 1992 No
# 8 A 1991 3 1993 No
# 9 A 1991 3 1994 No
#10 A 1991 3 1995 No
# … with 11 more rows