r替换低于矩阵中每列平均值的值
我有一个巨大的矩阵,我想用一个NA来替换低于每列平均值或中位数的值。 例如,使用此矩阵: 我想得到:r替换低于矩阵中每列平均值的值,r,R,我有一个巨大的矩阵,我想用一个NA来替换低于每列平均值或中位数的值。 例如,使用此矩阵: 我想得到: for(i in 1:ncol(ex)){ ex[, i][ex[, i] < colMeans(ex)[i]] <- NA } ex [,1] [,2] [,3] [1,] NA NA 0.6 [2,] NA 0.9 NA [3,] 0.6 0.9 NA [4,] 0.9 0.7 NA 上面的代码使用for循环,我希望有一个更快的
for(i in 1:ncol(ex)){
ex[, i][ex[, i] < colMeans(ex)[i]] <- NA
}
ex
[,1] [,2] [,3]
[1,] NA NA 0.6
[2,] NA 0.9 NA
[3,] 0.6 0.9 NA
[4,] 0.9 0.7 NA
上面的代码使用for循环,我希望有一个更快的矢量化版本。我们可以使用sweep
这是另一个基本的解决方案
ex <- replace(ex, ex < t(replicate(nrow(ex),colMeans(ex))),NA)
将@Ronak Shah和@ThomasIsCoding提出的解决方案与较大矩阵上的微基准进行比较,得出以下结果:
# Generate matrix
set.seed(1)
ex <- matrix(data = round(runif(100000), 1), nrow = 1000, ncol = 100)
ex
colMeans(ex)
# for-loop solution
ex2 <- ex
for(i in 1:ncol(ex2)){
ex2[, i][ex2[, i] < colMeans(ex2)[i]] <- NA
}
ex2
# Solution with sweep
ex3 <- ex
ex3[sweep(ex3, 2, colMeans(ex3), "<")] <- NA
ex3
# Solution with replace
ex4 <- ex
ex4 <- replace(ex4, ex4 < t(replicate(nrow(ex4), colMeans(ex4))), NA)
ex4
# Transposing solution
ex5 <- ex
ex5[t(t(ex5) < colMeans(ex5))] <- NA
ex5
# Apply solution
ex6 <- ex
apply(ex6, 2, function(x) replace(x, x < mean(x), NA))
ex6
# Identical
all.equal(ex2, ex3, ex4, ex5, ex6)
# Microbenchmark
library(microbenchmark)
comp <- microbenchmark(
for_loop = {
ex2 <- ex
for(i in 1:ncol(ex2)){
ex2[, i][ex2[, i] < colMeans(ex2)[i]] <- NA
}},
sweep = {
ex3 <- ex
ex3[sweep(ex3, 2, colMeans(ex3), "<")] <- NA
},
replace = {
ex4 <- ex
ex4 <- replace(ex4, ex4 < t(replicate(nrow(ex4), colMeans(ex4))), NA)
},
transpose = {
ex5 <- ex
ex5[t(t(ex5) < colMeans(ex5))] <- NA
},
apply = {
ex6 <- ex
apply(ex6, 2, function(x) replace(x, x < mean(x), NA))
}
)
library(ggplot2)
autoplot(comp)
它们给出了相同的结果,但扫描法似乎是最快的
ex[t(t(ex) < colMeans(ex))] <- NA
apply(ex, 2, function(x) replace(x, x < mean(x), NA))
ex <- replace(ex, ex < t(replicate(nrow(ex),colMeans(ex))),NA)
# Generate matrix
set.seed(1)
ex <- matrix(data = round(runif(100000), 1), nrow = 1000, ncol = 100)
ex
colMeans(ex)
# for-loop solution
ex2 <- ex
for(i in 1:ncol(ex2)){
ex2[, i][ex2[, i] < colMeans(ex2)[i]] <- NA
}
ex2
# Solution with sweep
ex3 <- ex
ex3[sweep(ex3, 2, colMeans(ex3), "<")] <- NA
ex3
# Solution with replace
ex4 <- ex
ex4 <- replace(ex4, ex4 < t(replicate(nrow(ex4), colMeans(ex4))), NA)
ex4
# Transposing solution
ex5 <- ex
ex5[t(t(ex5) < colMeans(ex5))] <- NA
ex5
# Apply solution
ex6 <- ex
apply(ex6, 2, function(x) replace(x, x < mean(x), NA))
ex6
# Identical
all.equal(ex2, ex3, ex4, ex5, ex6)
# Microbenchmark
library(microbenchmark)
comp <- microbenchmark(
for_loop = {
ex2 <- ex
for(i in 1:ncol(ex2)){
ex2[, i][ex2[, i] < colMeans(ex2)[i]] <- NA
}},
sweep = {
ex3 <- ex
ex3[sweep(ex3, 2, colMeans(ex3), "<")] <- NA
},
replace = {
ex4 <- ex
ex4 <- replace(ex4, ex4 < t(replicate(nrow(ex4), colMeans(ex4))), NA)
},
transpose = {
ex5 <- ex
ex5[t(t(ex5) < colMeans(ex5))] <- NA
},
apply = {
ex6 <- ex
apply(ex6, 2, function(x) replace(x, x < mean(x), NA))
}
)
library(ggplot2)
autoplot(comp)