如何去除r中有NA的尾巴?
我有一个向量:如何去除r中有NA的尾巴?,r,na,missing-data,R,Na,Missing Data,我有一个向量: a <- c(NA,1:5,NA,NA,1:3, rep(NA,round(runif(1,0,100)))) 一个选择是 a[rev(cumprod(rev(is.na(a)))) == 0] # [1] NA 1 2 3 4 5 NA NA 1 2 3 以下是步骤: (a <- c(NA, 1:5, NA, NA, 1:3, NA, NA)) # [1] NA 1 2 3 4 5 NA NA 1 2 3 NA NA is.na(
a <- c(NA,1:5,NA,NA,1:3, rep(NA,round(runif(1,0,100))))
一个选择是
a[rev(cumprod(rev(is.na(a)))) == 0]
# [1] NA 1 2 3 4 5 NA NA 1 2 3
以下是步骤:
(a <- c(NA, 1:5, NA, NA, 1:3, NA, NA))
# [1] NA 1 2 3 4 5 NA NA 1 2 3 NA NA
is.na(a)
# [1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
rev(is.na(a))
# [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
cumprod(rev(is.na(a)))
# [1] 1 1 0 0 0 0 0 0 0 0 0 0 0
rev(cumprod(rev(is.na(a))))
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1
(a我认为这是可行的:
rm_NA_tail <- function(a) {
if (is.na(a[length(a)])) {
return(a[is.na(match(data.table::rleid(a), max(data.table::rleid(a))))])
} else {
return(a)
}
}
rm\u NA\u tail你可以做
a[1:max(which(!is.na(a)))]
# [1] NA 1 2 3 4 5 NA NA 1 2 3
我们将向量从位置1子集到最后一个非NA值。您可以找到非NA的最大位置并相应地子集
> a[1:max(which(!is.na(a)))]
[1] NA 1 2 3 4 5 NA NA 1 2 3
还有一种可能性:
a[cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)]
[1] NA 1 2 3 4 5 NA NA 1 2 3
在个别步骤中:
is.na(a)
[1] TRUE FALSE FALSE FALSE FALSE
cumsum(!is.na(a))
[1] 0 1 2 3 4
cumsum(!is.na(a)) != max(cumsum(!is.na(a)))
[1] TRUE TRUE TRUE TRUE TRUE
cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)
[1] TRUE TRUE TRUE TRUE TRUE
只是为了好玩,做一点基准测试:
library(microbenchmark)
a <- rep(a, 1e5)
microbenchmark(
markus = a[1:max(which(!is.na(a)))],
Julius_Vainora = a[rev(cumprod(rev(is.na(a)))) == 0],
Kim = rm_NA_tail(a),
tmfmnk = a[cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)],
nsinghs = a[1:(length(a) - rle(is.na(rev(a)))$lengths[1])],
times = 5
)
Unit: milliseconds
expr min lq mean median uq max neval cld
markus 150.7346 153.0674 156.4194 153.3031 159.4718 165.5201 5 a
Julius_Vainora 393.8520 418.8186 616.3269 703.4022 749.6600 815.9018 5 bc
Kim 370.7680 382.1826 536.0828 632.0031 642.1882 653.2720 5 bc
tmfmnk 390.2626 415.2378 466.4245 415.8310 423.3828 687.4082 5 b
nsinghs 537.0404 781.1403 798.6929 793.1027 842.6777 1039.5033 5 c
库(微基准)
a这可以使用rle()
rle(is.na(rev(a))$length[1]
获取向量中拖尾na
的计数,然后从总向量length
中减去它,以获得要保持向量的索引。当a
仅包含na
时,此操作失败(可能非常不可能)。相关:
library(microbenchmark)
a <- rep(a, 1e5)
microbenchmark(
markus = a[1:max(which(!is.na(a)))],
Julius_Vainora = a[rev(cumprod(rev(is.na(a)))) == 0],
Kim = rm_NA_tail(a),
tmfmnk = a[cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)],
nsinghs = a[1:(length(a) - rle(is.na(rev(a)))$lengths[1])],
times = 5
)
Unit: milliseconds
expr min lq mean median uq max neval cld
markus 150.7346 153.0674 156.4194 153.3031 159.4718 165.5201 5 a
Julius_Vainora 393.8520 418.8186 616.3269 703.4022 749.6600 815.9018 5 bc
Kim 370.7680 382.1826 536.0828 632.0031 642.1882 653.2720 5 bc
tmfmnk 390.2626 415.2378 466.4245 415.8310 423.3828 687.4082 5 b
nsinghs 537.0404 781.1403 798.6929 793.1027 842.6777 1039.5033 5 c
a[1:(length(a) - rle(is.na(rev(a)))$lengths[1])]
# [1] NA 1 2 3 4 5 NA NA 1 2 3