R 合并/折叠向量中相同的连续元素

R 合并/折叠向量中相同的连续元素,r,R,我试图将相同的连续观察合并到一个折叠的字符串中。一个简单的例子如下所示: a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T") [1] "H" "H" "H" "N" "T" "N" "T" "H" "N" "T" "T" b <- c("HHH", "N", "T", "N", "T", "H", "N", "TT") [1] "HHH" "N" "T" "N" "T" "H" "N"

我试图将相同的连续观察合并到一个折叠的字符串中。一个简单的例子如下所示:

a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")
[1] "H" "H" "H" "N" "T" "N" "T" "H" "N" "T" "T"

b <- c("HHH", "N", "T", "N", "T", "H", "N", "TT")
[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"

c <- c("HHH", "HHH", "N", "T", "N", "T", "H", "N", "TT", "TT")
[1] "HHH" "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"  "TT" 
给出类似于

Run Length Encoding
  lengths: int [1:8] 3 1 1 1 1 1 1 2
  values : chr [1:8] "H" "N" "T" "N" "T" "H" "N" "T"
其中十个元素变为8,连续出现的位置不记录

with(rle(a), sapply(1:length(values), function(i)
    paste(rep(values[i], lengths[i]), collapse = "")))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 


我们可以使用
data.table中的
rleid

library(data.table)
unname(tapply(a, rleid(a), FUN = paste, collapse=""))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 
或者使用
baser
rle
tapply

with(rle(a), unname(tapply(a, rep(seq_along(values), lengths), FUN = paste, collapse="")))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 

或者,
base R
选项是将字符串粘贴在一起,并使用regex lookarounds在重复字符之间拆分

strsplit(paste(a, collapse=""), "(?<=(.))(?!\\1)", perl = TRUE)[[1]]
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 

strsplit(粘贴(a,collapse=”),“(?除了已经给出的解决方案外,我还对一种不依赖任何语言特定性的通用算法感兴趣

你说你试过了,但我不认为重复次数不受限制是一个真正的问题。我写的基本上是迭代原始数组并克隆它。如果原始数组的值与上一个数组的值相同,而不是将其作为新项添加到新数组中,而是将其连接到“克隆”数组的最后一个值中

算法:

Create empty array(w)
Iterate by index(i) of the original vector(v)
   If this is the first entry
      w[1] = v[1]
   Else
      If v[i] is the same as v[i-1]
         Last entry in w is concatenated with v[i]
      Else
         Add v[i] to the end of w
在Python中:

def collapseVector(v):
w=[];
对于范围内的i(len(v)):
如果i==0:
w、 附加(v[i]);
其他:
如果v[i]==v[i-1]:
w[len(w)-1]=w[len(w)-1]+v[i];
其他:
w、 附加(v[i]);
返回w

您可以将
gregexpr
regmatches
一起使用:

a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")

# collapse string
b <- paste(a, collapse = "")

# extract instances of repeated characters
regmatches(b, gregexpr("(.)\\1*", b))[[1]]
# [1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"
以及良好措施的
ore
包装:

library(ore)
matches(ore.search("(.)\\1*", b, all = TRUE))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"

哇-太快了!非常感谢!
a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")

# collapse string
b <- paste(a, collapse = "")

# extract instances of repeated characters
regmatches(b, gregexpr("(.)\\1*", b))[[1]]
# [1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"
library(stringi)
stri_extract_all_regex(b, "(.)\\1*")[[1]]
# [1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"
library(ore)
matches(ore.search("(.)\\1*", b, all = TRUE))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"