R 返回基于布尔数的数字序列的更优雅的方法？_R

R 返回基于布尔数的数字序列的更优雅的方法？

R 返回基于布尔数的数字序列的更优雅的方法？,r,R,下面是我在data.frame中使用的布尔值示例： atest这里有一种方法，使用方便的（但不广为人知/使用）基函数： > sequence(tabulate(cumsum(!atest))) [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 seq_along(atest) - cummax(seq_along(atest) * !atest) + 1L 要分解它： > # return/re

下面是我在data.frame中使用的布尔值示例：

atest这里有一种方法，使用方便的（但不广为人知/使用）基函数：
> sequence(tabulate(cumsum(!atest)))
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1

seq_along(atest) - cummax(seq_along(atest) * !atest) + 1L

要分解它：
> # return/repeat integer for each FALSE
> cumsum(!atest)
 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
> # count the number of occurrences of each integer
> tabulate(cumsum(!atest))
[1] 10 10  1
> # create concatenated seq_len for each integer
> sequence(tabulate(cumsum(!atest)))
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1

下面是使用其他熟悉功能的另一种方法：
> sequence(tabulate(cumsum(!atest)))
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1

seq_along(atest) - cummax(seq_along(atest) * !atest) + 1L

因为它都是矢量化的，所以明显比@Joshua的解决方案快（如果速度值得关注的话）：
f0这样的问题往往与Rcpp
配合得很好。借用@flodel的代码作为基准测试框架
boolseq.cpp
-----------

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector boolSeq(LogicalVector x) {
  int n = x.length();
  IntegerVector output = no_init(n);
  int counter = 1;
  for (int i=0; i < n; ++i) {
    if (!x[i]) {
      counter = 1;
    }
    output[i] = counter;
    ++counter;
  }
  return output;
}

/*** R
x <- c(FALSE, sample( c(FALSE, TRUE), 1E5, TRUE ))

f0 <- function(x) sequence(tabulate(cumsum(!x)))
f1 <- function(x) {i <- seq_along(x); i - cummax(i * !x) + 1L}

library(microbenchmark)
microbenchmark(f0(x), f1(x), boolSeq(x), times=100)

stopifnot(identical(f0(x), f1(x)))
stopifnot(identical(f1(x), boolSeq(x)))
*/

不那么优雅，但是非常接近你用R代码写的东西。
我已经+1了，但是我会再做一次，因为解释真的很有用@约书亚·乌尔里奇（Joshua Ulrich）为这个伟大的解决方案赢得+1分；但是如果第一个元素不是FALSE
：序列（tablate（cumsum（！atest[-1]））
@sgibb：我在回答之前没有尝试OP的代码，但是如果第一个元素不是FALSE，我看到它在2开始第一个序列。这似乎与他们的文字“我想返回一个数字序列，从每一个FALSE开始，从1增加到下一个FALSE”不一致。这太棒了。我的数据总是以FALSE开头。我从来没有使用过表格或序列，只有seq。非常感谢！在for循环中重新分配（“增长”）一个对象在R中是一个很大的禁忌。这是你能做的最慢的事情。我知道我尝试了一个sapply，但只是想得到基本的逻辑。你的解决方案正是我想要的。
boolseq.cpp
-----------

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector boolSeq(LogicalVector x) {
  int n = x.length();
  IntegerVector output = no_init(n);
  int counter = 1;
  for (int i=0; i < n; ++i) {
    if (!x[i]) {
      counter = 1;
    }
    output[i] = counter;
    ++counter;
  }
  return output;
}

/*** R
x <- c(FALSE, sample( c(FALSE, TRUE), 1E5, TRUE ))

f0 <- function(x) sequence(tabulate(cumsum(!x)))
f1 <- function(x) {i <- seq_along(x); i - cummax(i * !x) + 1L}

library(microbenchmark)
microbenchmark(f0(x), f1(x), boolSeq(x), times=100)

stopifnot(identical(f0(x), f1(x)))
stopifnot(identical(f1(x), boolSeq(x)))
*/

Unit: microseconds
       expr       min        lq     median         uq       max neval
      f0(x) 18174.348 22163.383 24109.5820 29668.1150 78144.411   100
      f1(x)  1498.871  1603.552  2251.3610  2392.1670  2682.078   100
 boolSeq(x)   388.288   426.034   518.2875   571.4235   699.710   100