R 根据特定列的值删除行

R 根据特定列的值删除行,r,R,我有以下数据集: ID <- c(1,2,3,4,5,6,7,8,9,10) x1 <- c(1.3, 1.4, NA, NA, 1.4, -1.0, NA, 0.3, 0.7, NA) x2 <- c(4.6, 2.6, NA, 4.3, NA, 5.6, NA, 3.7, 5.3, NA) x3 <- c(-0.9, 5.6, NA, -1.3, NA, -3.4, NA, 0.

我有以下数据集:

 ID <- c(1,2,3,4,5,6,7,8,9,10)
x1 <- c(1.3,    1.4,    NA, NA, 1.4,    -1.0,   NA, 0.3,    0.7,    NA)
x2 <- c(4.6,    2.6,    NA, 4.3,    NA, 5.6,    NA, 3.7,    5.3,    NA)
x3 <- c(-0.9,   5.6,    NA, -1.3,   NA, -3.4,   NA, 0.3,    -2.6,   NA)
x4 <- c(10.5,   NA, NA, 0.1,    -0.5,   NA, NA, 21.5,   2.0,    NA)
x5 <- c(9.5,    -5.0,   NA, -0.7,   3.6,    3.8,    -7.8,   9.8,    -12.2,  NA)
x6 <- c(-10.3,  NA, -4.4,   NA, 12.2,   NA, NA, -4.1,   3.3,    NA)

alldata <- data.frame(ID,x1,x2,x3,x4,x5,x6)

ID  x1  x2  x3  x4  x5  x6
1   1.3 4.6 -0.9    10.5    9.5 -10.3
2   1.4 2.6 5.6 "NA"    -5.0    "NA"
3   "NA"    "NA"    "NA"    "NA"    "NA"    -4.4
4   "NA"    4.3 -1.3    0.1 -0.7    "NA"
5   1.4 "NA"    "NA"    -0.5    3.6 12.2
6   -1.0    5.6 -3.4    "NA"    3.8 "NA"
7   "NA"    "NA"    "NA"    "NA"    -7.8    "NA"
8   0.3 3.7 0.3 21.5    9.8 -4.1
9   0.7 5.3 -2.6    2.0 -12.2   3.3
10  "NA"    "NA"    "NA"    "NA"    "NA"    "NA"

基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了

alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
  ID  x1  x2   x3   x4    x5    x6
1  1 1.3 4.6 -0.9 10.5   9.5 -10.3
2  2 1.4 2.6  5.6   NA    -5    NA
4  4  NA 4.3 -1.3  0.1  -0.7    NA
5  5 1.4  NA   NA -0.5   3.6  12.2
6  6  -1 5.6 -3.4   NA   3.8    NA
7  7  NA  NA   NA   NA  -7.8    NA
8  8 0.3 3.7  0.3 21.5   9.8  -4.1
9  9 0.7 5.3 -2.6    2 -12.2   3.3

基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了

alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
  ID  x1  x2   x3   x4    x5    x6
1  1 1.3 4.6 -0.9 10.5   9.5 -10.3
2  2 1.4 2.6  5.6   NA    -5    NA
4  4  NA 4.3 -1.3  0.1  -0.7    NA
5  5 1.4  NA   NA -0.5   3.6  12.2
6  6  -1 5.6 -3.4   NA   3.8    NA
7  7  NA  NA   NA   NA  -7.8    NA
8  8 0.3 3.7  0.3 21.5   9.8  -4.1
9  9 0.7 5.3 -2.6    2 -12.2   3.3

基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了

alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
  ID  x1  x2   x3   x4    x5    x6
1  1 1.3 4.6 -0.9 10.5   9.5 -10.3
2  2 1.4 2.6  5.6   NA    -5    NA
4  4  NA 4.3 -1.3  0.1  -0.7    NA
5  5 1.4  NA   NA -0.5   3.6  12.2
6  6  -1 5.6 -3.4   NA   3.8    NA
7  7  NA  NA   NA   NA  -7.8    NA
8  8 0.3 3.7  0.3 21.5   9.8  -4.1
9  9 0.7 5.3 -2.6    2 -12.2   3.3

基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了

alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
  ID  x1  x2   x3   x4    x5    x6
1  1 1.3 4.6 -0.9 10.5   9.5 -10.3
2  2 1.4 2.6  5.6   NA    -5    NA
4  4  NA 4.3 -1.3  0.1  -0.7    NA
5  5 1.4  NA   NA -0.5   3.6  12.2
6  6  -1 5.6 -3.4   NA   3.8    NA
7  7  NA  NA   NA   NA  -7.8    NA
8  8 0.3 3.7  0.3 21.5   9.8  -4.1
9  9 0.7 5.3 -2.6    2 -12.2   3.3
您可以通过以下方式执行此操作:

alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行
子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后

给出一个真/假矩阵,显示其中哪些不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及

alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅对这些行进行筛选。

您可以通过以下方法完成此操作:

alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行
子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后

给出一个真/假矩阵,显示其中哪些不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及

alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅对这些行进行筛选。

您可以通过以下方法完成此操作:

alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行
子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后

给出一个真/假矩阵,显示其中哪些不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及

alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅对这些行进行筛选。

您可以通过以下方法完成此操作:

alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行
子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后

给出一个真/假矩阵,显示其中哪些不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA

rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及

alldata[rowSums(!is.na(alldata[2:6])) > 0, ]

仅筛选这些行。

这里有一个使用行和的方法:

首先,我将您的因数
NA
转换为实际
NA

str(alldata)
'data.frame':   10 obs. of  7 variables:
 $ ID: num  1 2 3 4 5 6 7 8 9 10
 $ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
 $ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
 $ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
 $ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
 $ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
 $ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA

alldata[alldata=="NA"]=NA


sum(is.na(alldata))
    24
接下来,我将演示如何提取所有有意义变量中具有
NA
值的行:

which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1]  3 10
 alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
  ID   x1   x2   x3   x4    x5    x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6 <NA>    -5  <NA>
4  4 <NA>  4.3 -1.3  0.1  -0.7  <NA>
5  5  1.4 <NA> <NA> -0.5   3.6  12.2
6  6   -1  5.6 -3.4 <NA>   3.8  <NA>
7  7 <NA> <NA> <NA> <NA>  -7.8  <NA>
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3
最后,我们提取所需的行(那些在所有关键变量中没有
NA
的行):

alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6     -5  
4  4   4.3 -1.3  0.1  -0.7  
5  5  1.4   -0.5   3.6  12.2
6  6   -1  5.6 -3.4    3.8  
7  7      -7.8  
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3

下面是一个使用行和的方法:

首先,我将您的因数
NA
转换为实际
NA

str(alldata)
'data.frame':   10 obs. of  7 variables:
 $ ID: num  1 2 3 4 5 6 7 8 9 10
 $ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
 $ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
 $ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
 $ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
 $ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
 $ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA

alldata[alldata=="NA"]=NA


sum(is.na(alldata))
    24
接下来,我将演示如何提取所有有意义变量中具有
NA
值的行:

which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1]  3 10
 alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
  ID   x1   x2   x3   x4    x5    x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6 <NA>    -5  <NA>
4  4 <NA>  4.3 -1.3  0.1  -0.7  <NA>
5  5  1.4 <NA> <NA> -0.5   3.6  12.2
6  6   -1  5.6 -3.4 <NA>   3.8  <NA>
7  7 <NA> <NA> <NA> <NA>  -7.8  <NA>
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3
最后,我们提取所需的行(那些在所有关键变量中没有
NA
的行):

alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6     -5  
4  4   4.3 -1.3  0.1  -0.7  
5  5  1.4   -0.5   3.6  12.2
6  6   -1  5.6 -3.4    3.8  
7  7      -7.8  
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3

下面是一个使用行和的方法:

首先,我将您的因数
NA
转换为实际
NA

str(alldata)
'data.frame':   10 obs. of  7 variables:
 $ ID: num  1 2 3 4 5 6 7 8 9 10
 $ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
 $ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
 $ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
 $ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
 $ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
 $ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA

alldata[alldata=="NA"]=NA


sum(is.na(alldata))
    24
接下来,我将演示如何提取所有有意义变量中具有
NA
值的行:

which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1]  3 10
 alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
  ID   x1   x2   x3   x4    x5    x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6 <NA>    -5  <NA>
4  4 <NA>  4.3 -1.3  0.1  -0.7  <NA>
5  5  1.4 <NA> <NA> -0.5   3.6  12.2
6  6   -1  5.6 -3.4 <NA>   3.8  <NA>
7  7 <NA> <NA> <NA> <NA>  -7.8  <NA>
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3
最后,我们提取所需的行(那些在所有关键变量中没有
NA
的行):

alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6     -5  
4  4   4.3 -1.3  0.1  -0.7  
5  5  1.4   -0.5   3.6  12.2
6  6   -1  5.6 -3.4    3.8  
7  7      -7.8  
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3

下面是一个使用行和的方法:

首先,我将您的因数
NA
转换为实际
NA

str(alldata)
'data.frame':   10 obs. of  7 variables:
 $ ID: num  1 2 3 4 5 6 7 8 9 10
 $ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
 $ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
 $ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
 $ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
 $ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
 $ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA

alldata[alldata=="NA"]=NA


sum(is.na(alldata))
    24
接下来,我将演示如何提取所有有意义变量中具有
NA
值的行:

which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1]  3 10
 alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
  ID   x1   x2   x3   x4    x5    x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6 <NA>    -5  <NA>
4  4 <NA>  4.3 -1.3  0.1  -0.7  <NA>
5  5  1.4 <NA> <NA> -0.5   3.6  12.2
6  6   -1  5.6 -3.4 <NA>   3.8  <NA>
7  7 <NA> <NA> <NA> <NA>  -7.8  <NA>
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3
最后,我们提取所需的行(那些在所有关键变量中没有
NA
的行):

alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6     -5  
4  4   4.3 -1.3  0.1  -0.7  
5  5  1.4   -0.5   3.6  12.2
6  6   -1  5.6 -3.4    3.8  
7  7      -7.8  
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3

考虑到你的
NA
不是
NA
而是因子,因为你用
键入,所以R读它们就像字符串一样。因为che
stringsafactor
默认为
TRUE
它们是因子。是的,你需要去掉R的引号来解释那些“NA”“是的。然后看一看anyNA()函数,可以与apply一起使用它来获取要删除的行。只是修复了它。谢谢。请考虑到您的
NA
不是
NA
s,而是因子,因为您使用
键入,因此R读取它们就像字符串一样。因为che
stringsafactor
默认为