对R中的列使用nearZeroVar或应用唯一长度之间的区别
我有一个矩阵数的观测值,和矩阵数,我想删除所有的零列,然后我尝试使用nearZeroVar(数据集)对R中的列使用nearZeroVar或应用唯一长度之间的区别,r,R,我有一个矩阵数的观测值,和矩阵数,我想删除所有的零列,然后我尝试使用nearZeroVar(数据集) removeColumns第二个示例只是删除列中只有一个唯一值的列。考虑这一点: mdat <- matrix(c(100,100,100,100,100,100, 0,0,0,0, 0,0,3,0,0,0,0,0,0,0,1,2,3,0), nrow = 6, ncol = 4, byrow = FALSE) mdat # [,1] [,2] [,3] [,4] #[1,] 10
removeColumns第二个示例只是删除列中只有一个唯一值的列。考虑这一点:
mdat <- matrix(c(100,100,100,100,100,100, 0,0,0,0, 0,0,3,0,0,0,0,0,0,0,1,2,3,0), nrow = 6, ncol = 4, byrow = FALSE)
mdat
# [,1] [,2] [,3] [,4]
#[1,] 100 0 3 0
#[2,] 100 0 0 0
#[3,] 100 0 0 1
#[4,] 100 0 0 2
#[5,] 100 0 0 3
#[6,] 100 0 0 0
mdat[ , !apply(mdat, 2, function(x) length(unique(x)) == 1) ]
# [,1] [,2]
#[1,] 3 0
#[2,] 0 0
#[3,] 0 1
#[4,] 0 2
#[5,] 0 3
#[6,] 0 0
列1和列3被选中,因为它们包含1个值。之所以选择第二列,是因为大多数列值(0
)与下一个最常见(3
)的比率为5:1,大于4的截止值,以及列中唯一值(2个值,0
和3
)的数量占观察总数(6
行)的百分比is2/6*100
为33%,这是我们为唯一切割指定的40%
。第二个示例只是删除列中有一个唯一值的列。考虑这一点:
mdat <- matrix(c(100,100,100,100,100,100, 0,0,0,0, 0,0,3,0,0,0,0,0,0,0,1,2,3,0), nrow = 6, ncol = 4, byrow = FALSE)
mdat
# [,1] [,2] [,3] [,4]
#[1,] 100 0 3 0
#[2,] 100 0 0 0
#[3,] 100 0 0 1
#[4,] 100 0 0 2
#[5,] 100 0 0 3
#[6,] 100 0 0 0
mdat[ , !apply(mdat, 2, function(x) length(unique(x)) == 1) ]
# [,1] [,2]
#[1,] 3 0
#[2,] 0 0
#[3,] 0 1
#[4,] 0 2
#[5,] 0 3
#[6,] 0 0
列1和列3被选中,因为它们包含1个值。之所以选择第二列,是因为大多数列值(0
)与下一个最常见(3
)的比率为5:1,大于4的截止值,以及列中唯一值(2个值,0
和3
)的数量占观察总数(6
行)的百分比is2/6*100
为33%,这是我们为唯一切割指定的40%
。阅读nearZeroVar
的帮助,它清楚地解释了这一点。阅读nearZeroVar
的帮助,它清楚地解释了这一点。
mdat <- matrix(c(1,2,3,0,4,5, 0,0,0,0, 0,0,3,0,0,0,0,0,0,0,1,2,3,0), nrow = 6, ncol = 4, byrow = TRUE)
"
[,1] [,2] [,3] [,4]
[1,] 1 2 3 0
[2,] 4 5 0 0
[3,] 0 0 0 0
[4,] 3 0 0 0
[5,] 0 0 0 0
[6,] 1 2 3 0
"
cols_mdat <-nearZeroVar(mdat)
"4"
mdat_remove <-mdat[,-cols_mdat]
"[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 0
[3,] 0 0 0
[4,] 3 0 0
[5,] 0 0 0
[6,] 1 2 3
"
mdatzv <- apply(mdat, 2, function(x) length(unique(x)) == 1);
mdat_nzv <- mdat[, !mdatzv];
"
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 0
[3,] 0 0 0
[4,] 3 0 0
[5,] 0 0 0
[6,] 1 2 3
"
mdat <- matrix(c(100,100,100,100,100,100, 0,0,0,0, 0,0,3,0,0,0,0,0,0,0,1,2,3,0), nrow = 6, ncol = 4, byrow = FALSE)
mdat
# [,1] [,2] [,3] [,4]
#[1,] 100 0 3 0
#[2,] 100 0 0 0
#[3,] 100 0 0 1
#[4,] 100 0 0 2
#[5,] 100 0 0 3
#[6,] 100 0 0 0
mdat[ , !apply(mdat, 2, function(x) length(unique(x)) == 1) ]
# [,1] [,2]
#[1,] 3 0
#[2,] 0 0
#[3,] 0 1
#[4,] 0 2
#[5,] 0 3
#[6,] 0 0
nearZeroVar( mdat , freqCut = 4 , uniqueCut = 40 )
#[1] 1 2 3