R：根据列删除缺少数据的所有行_R_Dataframe

R：根据列删除缺少数据的所有行

r dataframe

R：根据列删除缺少数据的所有行,r,dataframe,R,Dataframe,我在R中有以下示例数据帧： Test <- data.frame("Individual"=c("John", "John", "Alice", "Alice", "Alice", "Eve", "Eve","Eve","Jack"), "ExamNumber"=c("Test1", "Test2", "Test1", "Test2", "Test3", "Test1", "Test2", "Test3", "Test3")) 但是，我想删除未通过所有三项测试的个人： Indivi

我在R中有以下示例数据帧：

Test <- data.frame("Individual"=c("John", "John", "Alice", "Alice", "Alice", "Eve", "Eve","Eve","Jack"), "ExamNumber"=c("Test1", "Test2", "Test1", "Test2", "Test3", "Test1", "Test2", "Test3",  "Test3"))

但是，我想删除未通过所有三项测试的个人：

  Individual ExamNumber
1      Alice      Test1
2      Alice      Test2
3      Alice      Test3
4        Eve      Test1
5        Eve      Test2
6        Eve      Test3

您可以使用

ave

按个人分组，并使用

NROW

Test[ave(1:nrow(Test), Test$Individual, FUN = NROW)==3,]
#  Individual ExamNumber
#3      Alice      Test1
#4      Alice      Test2
#5      Alice      Test3
#6        Eve      Test1
#7        Eve      Test2
#8        Eve      Test3

这里有一个稍微更稳健的方法，使用相同的思想，但是使用了

split

Test[order(Test$Individual),][unlist(lapply(split(Test, Test$Individual), function(a)
          rep(all(unique(Test$ExamNumber) %in% a$ExamNumber), NROW(a)))),]

下面是另一种使用

dplyr

检查组内是否存在所有三个测试的方法：

library(dplyr)
Test %>% 
  group_by(Individual) %>%
  filter(all(c("Test1", "Test2", "Test3") %in% ExamNumber)) %>%
  ungroup()

# A tibble: 6 × 2
  Individual ExamNumber
      <fctr>     <fctr>
1      Alice      Test1
2      Alice      Test2
3      Alice      Test3
4        Eve      Test1
5        Eve      Test2
6        Eve      Test3

库（dplyr）
测试%>%
分组（个人）%>%
过滤器（所有（c（“Test1”、“Test2”、“Test3”）%在%ExamNumber中））%>%
解组（）
#一个tibble:6×2
个人考试号
1爱丽丝测试1
2爱丽丝测试2
3爱丽丝测试3
4 Eve测试1
5除夕夜测试2
6 Eve测试3

使用基本R

ind_eq3 <- names( which( with( Test, by( Test, 
                                         INDICES = list(Individual), 
                                         FUN = function(x) length(unique(x$ExamNumber)) == 3) ) ) )
with(Test, Test[ Individual %in% ind_eq3, ] )

#   Individual ExamNumber
# 3      Alice      Test1
# 4      Alice      Test2
# 5      Alice      Test3
# 6        Eve      Test1
# 7        Eve      Test2
# 8        Eve      Test3

ind_eq3 <- names( which( with( Test, by( Test, 
                                         INDICES = list(Individual), 
                                         FUN = function(x) length(unique(x$ExamNumber)) == 3) ) ) )
with(Test, Test[ Individual %in% ind_eq3, ] )

#   Individual ExamNumber
# 3      Alice      Test1
# 4      Alice      Test2
# 5      Alice      Test3
# 6        Eve      Test1
# 7        Eve      Test2
# 8        Eve      Test3

library('data.table')
setDT(Test)[ , 
             j  = .SD[length( unique(ExamNumber) ) == 3, ],
             by = 'Individual']