R 根据特定列的值删除行
我有以下数据集:R 根据特定列的值删除行,r,R,我有以下数据集: ID <- c(1,2,3,4,5,6,7,8,9,10) x1 <- c(1.3, 1.4, NA, NA, 1.4, -1.0, NA, 0.3, 0.7, NA) x2 <- c(4.6, 2.6, NA, 4.3, NA, 5.6, NA, 3.7, 5.3, NA) x3 <- c(-0.9, 5.6, NA, -1.3, NA, -3.4, NA, 0.
ID <- c(1,2,3,4,5,6,7,8,9,10)
x1 <- c(1.3, 1.4, NA, NA, 1.4, -1.0, NA, 0.3, 0.7, NA)
x2 <- c(4.6, 2.6, NA, 4.3, NA, 5.6, NA, 3.7, 5.3, NA)
x3 <- c(-0.9, 5.6, NA, -1.3, NA, -3.4, NA, 0.3, -2.6, NA)
x4 <- c(10.5, NA, NA, 0.1, -0.5, NA, NA, 21.5, 2.0, NA)
x5 <- c(9.5, -5.0, NA, -0.7, 3.6, 3.8, -7.8, 9.8, -12.2, NA)
x6 <- c(-10.3, NA, -4.4, NA, 12.2, NA, NA, -4.1, 3.3, NA)
alldata <- data.frame(ID,x1,x2,x3,x4,x5,x6)
ID x1 x2 x3 x4 x5 x6
1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 1.4 2.6 5.6 "NA" -5.0 "NA"
3 "NA" "NA" "NA" "NA" "NA" -4.4
4 "NA" 4.3 -1.3 0.1 -0.7 "NA"
5 1.4 "NA" "NA" -0.5 3.6 12.2
6 -1.0 5.6 -3.4 "NA" 3.8 "NA"
7 "NA" "NA" "NA" "NA" -7.8 "NA"
8 0.3 3.7 0.3 21.5 9.8 -4.1
9 0.7 5.3 -2.6 2.0 -12.2 3.3
10 "NA" "NA" "NA" "NA" "NA" "NA"
基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了
alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 NA -5 NA
4 4 NA 4.3 -1.3 0.1 -0.7 NA
5 5 1.4 NA NA -0.5 3.6 12.2
6 6 -1 5.6 -3.4 NA 3.8 NA
7 7 NA NA NA NA -7.8 NA
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了
alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 NA -5 NA
4 4 NA 4.3 -1.3 0.1 -0.7 NA
5 5 1.4 NA NA -0.5 3.6 12.2
6 6 -1 5.6 -3.4 NA 3.8 NA
7 7 NA NA NA NA -7.8 NA
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了
alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 NA -5 NA
4 4 NA 4.3 -1.3 0.1 -0.7 NA
5 5 1.4 NA NA -0.5 3.6 12.2
6 6 -1 5.6 -3.4 NA 3.8 NA
7 7 NA NA NA NA -7.8 NA
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
基于您的数据的基本R解决方案(请阅读我的评论)。有了real
NA
s,我想这个解决方案就行不通了
alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 NA -5 NA
4 4 NA 4.3 -1.3 0.1 -0.7 NA
5 5 1.4 NA NA -0.5 3.6 12.2
6 6 -1 5.6 -3.4 NA 3.8 NA
7 7 NA NA NA NA -7.8 NA
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
您可以通过以下方式执行此操作:
alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后
给出一个真/假矩阵,显示其中哪些不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及
alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅对这些行进行筛选。您可以通过以下方法完成此操作:
alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后
给出一个真/假矩阵,显示其中哪些不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及
alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅对这些行进行筛选。您可以通过以下方法完成此操作:
alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后
给出一个真/假矩阵,显示其中哪些不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及
alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅对这些行进行筛选。您可以通过以下方法完成此操作:
alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
获取您关心的x1到x5列。(更好的做法可能是执行子集(alldata,select=x1:x5)
,这样就不需要依赖精确的列索引)。然后
给出一个真/假矩阵,显示其中哪些不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您每行中有多少项不是NA
rowSums(!is.na(alldata[2:6]))
rowSums(!is.na(alldata[2:6])) > 0
告诉您哪些行至少有一个非NA项,以及
alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
仅筛选这些行。这里有一个使用行和的方法: 首先,我将您的因数
NA
转换为实际NA
:
str(alldata)
'data.frame': 10 obs. of 7 variables:
$ ID: num 1 2 3 4 5 6 7 8 9 10
$ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
$ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
$ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
$ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
$ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
$ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA
alldata[alldata=="NA"]=NA
sum(is.na(alldata))
24
接下来,我将演示如何提取所有有意义变量中具有NA
值的行:
which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1] 3 10
alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 <NA> -5 <NA>
4 4 <NA> 4.3 -1.3 0.1 -0.7 <NA>
5 5 1.4 <NA> <NA> -0.5 3.6 12.2
6 6 -1 5.6 -3.4 <NA> 3.8 <NA>
7 7 <NA> <NA> <NA> <NA> -7.8 <NA>
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
最后,我们提取所需的行(那些在所有关键变量中没有NA
的行):
alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 -5
4 4 4.3 -1.3 0.1 -0.7
5 5 1.4 -0.5 3.6 12.2
6 6 -1 5.6 -3.4 3.8
7 7 -7.8
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
下面是一个使用行和的方法:
首先,我将您的因数NA
转换为实际NA
:
str(alldata)
'data.frame': 10 obs. of 7 variables:
$ ID: num 1 2 3 4 5 6 7 8 9 10
$ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
$ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
$ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
$ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
$ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
$ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA
alldata[alldata=="NA"]=NA
sum(is.na(alldata))
24
接下来,我将演示如何提取所有有意义变量中具有NA
值的行:
which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1] 3 10
alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 <NA> -5 <NA>
4 4 <NA> 4.3 -1.3 0.1 -0.7 <NA>
5 5 1.4 <NA> <NA> -0.5 3.6 12.2
6 6 -1 5.6 -3.4 <NA> 3.8 <NA>
7 7 <NA> <NA> <NA> <NA> -7.8 <NA>
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
最后,我们提取所需的行(那些在所有关键变量中没有NA
的行):
alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 -5
4 4 4.3 -1.3 0.1 -0.7
5 5 1.4 -0.5 3.6 12.2
6 6 -1 5.6 -3.4 3.8
7 7 -7.8
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
下面是一个使用行和的方法:
首先,我将您的因数NA
转换为实际NA
:
str(alldata)
'data.frame': 10 obs. of 7 variables:
$ ID: num 1 2 3 4 5 6 7 8 9 10
$ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
$ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
$ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
$ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
$ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
$ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA
alldata[alldata=="NA"]=NA
sum(is.na(alldata))
24
接下来,我将演示如何提取所有有意义变量中具有NA
值的行:
which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1] 3 10
alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 <NA> -5 <NA>
4 4 <NA> 4.3 -1.3 0.1 -0.7 <NA>
5 5 1.4 <NA> <NA> -0.5 3.6 12.2
6 6 -1 5.6 -3.4 <NA> 3.8 <NA>
7 7 <NA> <NA> <NA> <NA> -7.8 <NA>
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
最后,我们提取所需的行(那些在所有关键变量中没有NA
的行):
alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 -5
4 4 4.3 -1.3 0.1 -0.7
5 5 1.4 -0.5 3.6 12.2
6 6 -1 5.6 -3.4 3.8
7 7 -7.8
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
下面是一个使用行和的方法:
首先,我将您的因数NA
转换为实际NA
:
str(alldata)
'data.frame': 10 obs. of 7 variables:
$ ID: num 1 2 3 4 5 6 7 8 9 10
$ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
$ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
$ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
$ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
$ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
$ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA
alldata[alldata=="NA"]=NA
sum(is.na(alldata))
24
接下来,我将演示如何提取所有有意义变量中具有NA
值的行:
which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1] 3 10
alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 <NA> -5 <NA>
4 4 <NA> 4.3 -1.3 0.1 -0.7 <NA>
5 5 1.4 <NA> <NA> -0.5 3.6 12.2
6 6 -1 5.6 -3.4 <NA> 3.8 <NA>
7 7 <NA> <NA> <NA> <NA> -7.8 <NA>
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
最后,我们提取所需的行(那些在所有关键变量中没有NA
的行):
alldata[-其中(行和(即.na(alldata[,c(“x1”,“x2”,“x3”,“x4”,“x5”)))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 -5
4 4 4.3 -1.3 0.1 -0.7
5 5 1.4 -0.5 3.6 12.2
6 6 -1 5.6 -3.4 3.8
7 7 -7.8
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
考虑到你的NA
不是NA
而是因子,因为你用“
键入,所以R读它们就像字符串一样。因为chestringsafactor
默认为TRUE
它们是因子。是的,你需要去掉R的引号来解释那些“NA”“是的。然后看一看anyNA()函数,可以与apply一起使用它来获取要删除的行。只是修复了它。谢谢。请考虑到您的NA
不是NA
s,而是因子,因为您使用“
键入,因此R读取它们就像字符串一样。因为chestringsafactor
默认为