R 计数1';s从右向左,在第一个0处停止
我想计算多个列中从右到左出现的1的数量,当遇到第一个0时停止 示例DF:R 计数1';s从右向左,在第一个0处停止,r,count,conditional,R,Count,Conditional,我想计算多个列中从右到左出现的1的数量,当遇到第一个0时停止 示例DF: df<-data.frame(replicate(7,sample(0:1,30,rep=T))) colnames(df)<-seq(1950,2010,10) df我们可以通过行循环,使用rle df$condition <- apply(df, 1, function(x) {x1 <- rle(x) x2 <- tail(x1$lengths, 1)[tail(x1$va
df<-data.frame(replicate(7,sample(0:1,30,rep=T)))
colnames(df)<-seq(1950,2010,10)
df我们可以通过行循环,使用rle
df$condition <- apply(df, 1, function(x) {x1 <- rle(x)
x2 <- tail(x1$lengths, 1)[tail(x1$values, 1)==1]
if(length(x2)==0) 0 else x2})
或者使用更高效的stringi
library(stringi)
v1 <- stri_count(stri_extract(do.call(paste0, df), regex = "1+$"), regex = ".")
v1[is.na(v1)] <- 0
df$condition <- v1
[编辑:现在工作]
试试这个
df$condition <- apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])})
df$conditiondf$condition下面是一个完全矢量化的尝试
indx <- rowSums(df) == ncol(df) # Per Jaaps comment
df$condition <- ncol(df) - max.col(-df, ties = "last")
df$condition[indx] <- ncol(df) - 1
indx使用set.seed(…)
使您的示例可复制。感谢@jogo,下次就可以了,虽然在这里不是绝对必要的。顺便说一句,当您有这样的二进制数据集时,使用矩阵而不是data.frames会快得多。感谢@DavidArenburg-下次设置示例数据集时,我会记住这一点!你的第二个解决方案是最快的,所以far@Moody_Mudskipper我对此表示严重怀疑。你做了适当的基准测试吗?文本处理不能比由戴维提供的向量化数字运算快。“罗兰德,我认为,戴维的解决方案是最慢的,和akrun的一样。first@Moody_Mudskipper将基准添加到您的答案中。我怀疑你已经用OP的小例子做了基准测试,速度差异并不是真正相关的。看我的答案,你对更大的df是正确的,更大的df,速度也更快,apply(df[length(df):1],1,function(x)sum(cummin(x)==1))
我认为你需要添加df$条件[rowSums(df==1)==7]很好的捕捉@Jaap,必须在那里添加一些额外的步骤
df$condition <- apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])})
library(stringr)
microbenchmark(
Moody_Mudskipper = apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])}),
akrun = apply(df, 1, function(x) {x1 <- rle(x)
x2 <- tail(x1$lengths, 1)[tail(x1$values, 1)==1]
if(length(x2)==0) 0 else x2}),
akrun2 = str_count(do.call(paste0, df), "[1]+$"),
roland = apply(df, 1, function(x) {y <- rev(x);sum(y * cumprod(y != 0L))}),
David_Arenburg = ncol(df) - max.col(-df, ties = "last"),
times = 10)
# Unit: microseconds
# expr min lq mean median uq max neval
# Moody_Mudskipper 1437.948 1480.417 1677.1929 1536.159 1597.209 3009.320 10
# akrun 6985.174 7121.078 7718.2696 7691.053 7856.862 9289.146 10
# akrun2 1101.731 1188.793 1290.8971 1226.486 1343.099 1790.091 10
# akrun3 693.315 791.703 830.3507 820.371 884.782 1030.240 10
# roland 1197.995 1270.901 1708.5143 1332.305 1727.802 4568.660 10
# David_Arenburg 2845.459 3060.638 3406.3747 3167.519 3495.950 5408.494 10
# David_Arenburg_corrected 3243.964 3341.644 3757.6330 3384.645 4195.635 4943.099 10
df<-data.frame(replicate(7,sample(0:1,1000,rep=T)))
# Unit: milliseconds
# expr min lq mean median uq max neval
# Moody_Mudskipper 31.324456 32.155089 34.168533 32.827345 33.848560 44.952570 10
# akrun 225.592061 229.055097 238.307506 234.761584 241.266853 271.000470 10
# akrun2 28.779824 29.261499 33.316700 30.118144 38.026145 46.711869 10
# akrun3 14.184466 14.334879 15.528201 14.633227 17.237317 18.763742 10
# roland 27.946005 28.341680 29.328530 28.497224 29.760516 33.692485 10
# David_Arenburg 3.149823 3.282187 3.630118 3.455427 3.727762 5.240031 10
# David_Arenburg_corrected 3.464098 3.534527 4.103335 3.833937 4.187141 6.165159 10
df$condition <- apply(df, 1, function(x) {
y <- rev(x)
sum(cumprod(y))
})
indx <- rowSums(df) == ncol(df) # Per Jaaps comment
df$condition <- ncol(df) - max.col(-df, ties = "last")
df$condition[indx] <- ncol(df) - 1