R 计数1';s从右向左,在第一个0处停止

R 计数1';s从右向左,在第一个0处停止,r,count,conditional,R,Count,Conditional,我想计算多个列中从右到左出现的1的数量,当遇到第一个0时停止 示例DF: df<-data.frame(replicate(7,sample(0:1,30,rep=T))) colnames(df)<-seq(1950,2010,10) df我们可以通过行循环,使用rle df$condition <- apply(df, 1, function(x) {x1 <- rle(x) x2 <- tail(x1$lengths, 1)[tail(x1$va

我想计算多个列中从右到左出现的1的数量,当遇到第一个0时停止

示例DF:

df<-data.frame(replicate(7,sample(0:1,30,rep=T)))
colnames(df)<-seq(1950,2010,10)

df我们可以通过行循环,使用
rle

df$condition <- apply(df, 1, function(x) {x1 <- rle(x)
      x2 <- tail(x1$lengths, 1)[tail(x1$values, 1)==1]
      if(length(x2)==0) 0 else x2})

或者使用更高效的
stringi

library(stringi)
v1 <- stri_count(stri_extract(do.call(paste0, df), regex = "1+$"), regex = ".")
v1[is.na(v1)] <- 0
df$condition <- v1
[编辑:现在工作]

试试这个

df$condition <-  apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])})

df$condition
df$condition下面是一个完全矢量化的尝试

indx <- rowSums(df) == ncol(df) # Per Jaaps comment
df$condition <- ncol(df) - max.col(-df, ties = "last")
df$condition[indx] <- ncol(df) - 1

indx使用
set.seed(…)
使您的示例可复制。感谢@jogo,下次就可以了,虽然在这里不是绝对必要的。顺便说一句,当您有这样的二进制数据集时,使用矩阵而不是data.frames会快得多。感谢@DavidArenburg-下次设置示例数据集时,我会记住这一点!你的第二个解决方案是最快的,所以far@Moody_Mudskipper我对此表示严重怀疑。你做了适当的基准测试吗?文本处理不能比由戴维提供的向量化数字运算快。“罗兰德,我认为,戴维的解决方案是最慢的,和akrun的一样。first@Moody_Mudskipper将基准添加到您的答案中。我怀疑你已经用OP的小例子做了基准测试,速度差异并不是真正相关的。看我的答案,你对更大的df是正确的,更大的df,速度也更快,
apply(df[length(df):1],1,function(x)sum(cummin(x)==1))
我认为你需要添加
df$条件[rowSums(df==1)==7]很好的捕捉@Jaap,必须在那里添加一些额外的步骤
df$condition <-  apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])})
library(stringr)
microbenchmark(
Moody_Mudskipper =  apply(df,1,function(x){x<- rev(x);m <- match(0,x)[1]; if (is.na(m)) sum(x) else sum(x[1:m])}),
akrun =  apply(df, 1, function(x) {x1 <- rle(x)
                                          x2 <- tail(x1$lengths, 1)[tail(x1$values, 1)==1]
                                          if(length(x2)==0) 0 else x2}),
akrun2 = str_count(do.call(paste0, df), "[1]+$"),
roland = apply(df, 1, function(x) {y <- rev(x);sum(y * cumprod(y != 0L))}),
David_Arenburg  = ncol(df) - max.col(-df, ties = "last"),
times = 10)

# Unit: microseconds
#                     expr      min       lq      mean   median       uq      max neval
#         Moody_Mudskipper 1437.948 1480.417 1677.1929 1536.159 1597.209 3009.320    10
#                    akrun 6985.174 7121.078 7718.2696 7691.053 7856.862 9289.146    10
#                   akrun2 1101.731 1188.793 1290.8971 1226.486 1343.099 1790.091    10
#                   akrun3  693.315  791.703  830.3507  820.371  884.782 1030.240    10
#                   roland 1197.995 1270.901 1708.5143 1332.305 1727.802 4568.660    10
#           David_Arenburg 2845.459 3060.638 3406.3747 3167.519 3495.950 5408.494    10
# David_Arenburg_corrected 3243.964 3341.644 3757.6330 3384.645 4195.635 4943.099    10
df<-data.frame(replicate(7,sample(0:1,1000,rep=T)))

# Unit: milliseconds
#                     expr        min         lq       mean     median         uq        max neval
#         Moody_Mudskipper  31.324456  32.155089  34.168533  32.827345  33.848560  44.952570    10
#                    akrun 225.592061 229.055097 238.307506 234.761584 241.266853 271.000470    10
#                   akrun2  28.779824  29.261499  33.316700  30.118144  38.026145  46.711869    10
#                   akrun3  14.184466  14.334879  15.528201  14.633227  17.237317  18.763742    10
#                   roland  27.946005  28.341680  29.328530  28.497224  29.760516  33.692485    10
#           David_Arenburg   3.149823   3.282187   3.630118   3.455427   3.727762   5.240031    10
# David_Arenburg_corrected   3.464098   3.534527   4.103335   3.833937   4.187141   6.165159    10
df$condition <- apply(df, 1, function(x) {
  y <- rev(x)
  sum(cumprod(y))
})
indx <- rowSums(df) == ncol(df) # Per Jaaps comment
df$condition <- ncol(df) - max.col(-df, ties = "last")
df$condition[indx] <- ncol(df) - 1