R:根据列删除缺少数据的所有行
我在R中有以下示例数据帧:R:根据列删除缺少数据的所有行,r,dataframe,R,Dataframe,我在R中有以下示例数据帧: Test <- data.frame("Individual"=c("John", "John", "Alice", "Alice", "Alice", "Eve", "Eve","Eve","Jack"), "ExamNumber"=c("Test1", "Test2", "Test1", "Test2", "Test3", "Test1", "Test2", "Test3", "Test3")) 但是,我想删除未通过所有三项测试的个人: Indivi
Test <- data.frame("Individual"=c("John", "John", "Alice", "Alice", "Alice", "Eve", "Eve","Eve","Jack"), "ExamNumber"=c("Test1", "Test2", "Test1", "Test2", "Test3", "Test1", "Test2", "Test3", "Test3"))
但是,我想删除未通过所有三项测试的个人:
Individual ExamNumber
1 Alice Test1
2 Alice Test2
3 Alice Test3
4 Eve Test1
5 Eve Test2
6 Eve Test3
您可以使用
ave
按个人分组,并使用NROW
Test[ave(1:nrow(Test), Test$Individual, FUN = NROW)==3,]
# Individual ExamNumber
#3 Alice Test1
#4 Alice Test2
#5 Alice Test3
#6 Eve Test1
#7 Eve Test2
#8 Eve Test3
这里有一个稍微更稳健的方法,使用相同的思想,但是使用了split
Test[order(Test$Individual),][unlist(lapply(split(Test, Test$Individual), function(a)
rep(all(unique(Test$ExamNumber) %in% a$ExamNumber), NROW(a)))),]
下面是另一种使用
dplyr
检查组内是否存在所有三个测试的方法:
library(dplyr)
Test %>%
group_by(Individual) %>%
filter(all(c("Test1", "Test2", "Test3") %in% ExamNumber)) %>%
ungroup()
# A tibble: 6 × 2
Individual ExamNumber
<fctr> <fctr>
1 Alice Test1
2 Alice Test2
3 Alice Test3
4 Eve Test1
5 Eve Test2
6 Eve Test3
库(dplyr)
测试%>%
分组(个人)%>%
过滤器(所有(c(“Test1”、“Test2”、“Test3”)%在%ExamNumber中))%>%
解组()
#一个tibble:6×2
个人考试号
1爱丽丝测试1
2爱丽丝测试2
3爱丽丝测试3
4 Eve测试1
5除夕夜测试2
6 Eve测试3
使用基本R
ind_eq3 <- names( which( with( Test, by( Test,
INDICES = list(Individual),
FUN = function(x) length(unique(x$ExamNumber)) == 3) ) ) )
with(Test, Test[ Individual %in% ind_eq3, ] )
# Individual ExamNumber
# 3 Alice Test1
# 4 Alice Test2
# 5 Alice Test3
# 6 Eve Test1
# 7 Eve Test2
# 8 Eve Test3
ind_eq3 <- names( which( with( Test, by( Test,
INDICES = list(Individual),
FUN = function(x) length(unique(x$ExamNumber)) == 3) ) ) )
with(Test, Test[ Individual %in% ind_eq3, ] )
# Individual ExamNumber
# 3 Alice Test1
# 4 Alice Test2
# 5 Alice Test3
# 6 Eve Test1
# 7 Eve Test2
# 8 Eve Test3
library('data.table')
setDT(Test)[ ,
j = .SD[length( unique(ExamNumber) ) == 3, ],
by = 'Individual']