R 比较同一向量的相邻元素(避免循环)
我设法为循环写了一个R 比较同一向量的相邻元素(避免循环),r,vector,string-comparison,sapply,R,Vector,String Comparison,Sapply,我设法为循环写了一个,以比较以下向量中的字母: bases <- c("G","C","A","T") test <- sample(bases, replace=T, 20) 通过函数Comp()我可以检查一个字母是否与下一个字母匹配 Comp <- function(data) { output <- vector() for(i in 1:(length(data)-1)) { if(data[i]==data[i+1])
,以比较以下向量中的字母:
bases <- c("G","C","A","T")
test <- sample(bases, replace=T, 20)
通过函数Comp()
我可以检查一个字母是否与下一个字母匹配
Comp <- function(data)
{
output <- vector()
for(i in 1:(length(data)-1))
{
if(data[i]==data[i+1])
{
output[i] <-1
}
else
{
output[i] <-0
}
}
return(output)
}
这是可行的,但是它的速度非常慢,数量很多。因此,我尝试了sapply()
Comp只需“滞后”测试
并使用向量化的=
bases <- c("G","C","A","T")
set.seed(21)
test <- sample(bases, replace=TRUE, 20)
lag.test <- c(tail(test,-1),NA)
#lag.test <- c(NA,head(test,-1))
test == lag.test
正如@Joshua所写的,你当然应该使用矢量化——这是更有效的方式。
…但仅供参考,您的Comp
功能仍然可以进行一些优化
比较的结果是TRUE/FALSE
,这是1/0
的美化版本。此外,确保结果是整数而不是数字会占用一半的内存
Comp.opt <- function(data)
{
output <- integer(length(data)-1L)
for(i in seq_along(output))
{
output[[i]] <- (data[[i]]==data[[i+1L]])
}
return(output)
}
Comp.opt看看这个:
> x = c("T", "G", "T", "G", "G","T","T","T")
>
> res = sequence(rle(x)$lengths)-1
>
> dt = data.frame(x,res)
>
> dt
x res
1 T 0
2 G 0
3 T 0
4 G 0
5 G 1
6 T 0
7 T 1
8 T 2
可能会更快。我重新编写了标题,以更好地说明问题,并供参考。您还应该知道sapply/lappy等都是循环,尽管形式不同。另请参见谢谢,我是R和编程新手,对术语不太熟悉谢谢你的建议!
bases <- c("G","C","A","T")
set.seed(21)
test <- sample(bases, replace=TRUE, 20)
lag.test <- c(tail(test,-1),NA)
#lag.test <- c(NA,head(test,-1))
test == lag.test
set.seed(21)
test <- sample(bases, replace=T, 1e5)
system.time(orig <- Comp(test))
# user system elapsed
# 34.760 0.010 34.884
system.time(prealloc <- Comp.prealloc(test))
# user system elapsed
# 1.18 0.00 1.19
identical(orig, prealloc)
# [1] TRUE
Comp.opt <- function(data)
{
output <- integer(length(data)-1L)
for(i in seq_along(output))
{
output[[i]] <- (data[[i]]==data[[i+1L]])
}
return(output)
}
> system.time(orig <- Comp(test))
user system elapsed
21.10 0.00 21.11
> system.time(prealloc <- Comp.prealloc(test))
user system elapsed
0.49 0.00 0.49
> system.time(opt <- Comp.opt(test))
user system elapsed
0.41 0.00 0.40
> all.equal(opt, orig) # opt is integer, orig is double
[1] TRUE
> x = c("T", "G", "T", "G", "G","T","T","T")
>
> res = sequence(rle(x)$lengths)-1
>
> dt = data.frame(x,res)
>
> dt
x res
1 T 0
2 G 0
3 T 0
4 G 0
5 G 1
6 T 0
7 T 1
8 T 2