Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 如何将字符串拆分为给定长度的子字符串?_String_R_Split - Fatal编程技术网

String 如何将字符串拆分为给定长度的子字符串?

String 如何将字符串拆分为给定长度的子字符串?,string,r,split,String,R,Split,我有一个字符串,例如: “aabbccdd” 我想把这个字符串分解成长度为2的子字符串向量: “aa”“bb”“cc”“cc”“dd”这里有一种方法 substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2)) #[1] "aa" "bb" "cc" "cc" "dd" 或者更一般地说 text <- "aabbccccdd" substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text

我有一个字符串,例如:

“aabbccdd”

我想把这个字符串分解成长度为2的子字符串向量:

“aa”“bb”“cc”“cc”“dd”

这里有一种方法

substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2))
#[1] "aa" "bb" "cc" "cc" "dd"
或者更一般地说

text <- "aabbccccdd"
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
#[1] "aa" "bb" "cc" "cc" "dd"

text可以使用矩阵对字符进行分组:

s2 <- function(x) {
  m <- matrix(strsplit(x, '')[[1]], nrow=2)
  apply(m, 2, paste, collapse='')
}

s2('aabbccddeeff')
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
更不幸的是,@GSee的
g1
g2
对于奇数字符串长度的输入,会自动返回错误的结果:

g1('abc')
## [1] "ab"

g2('abc')
## [1] "ab" "cb"
这是s2精神下的函数,为每组中的字符数取一个参数,并在必要时保留最后一个条目的长度:

s <- function(x, n) {
  sst <- strsplit(x, '')[[1]]
  m <- matrix('', nrow=n, ncol=(length(sst)+n-1)%/%n)
  m[seq_along(sst)] <- sst
  apply(m, 2, paste, collapse='')
}

s('hello world', 2)
## [1] "he" "ll" "o " "wo" "rl" "d" 
s('hello world', 3)
## [1] "hel" "lo " "wor" "ld" 
s丑陋但有效

sequenceString <- "ATGAATAAAG"

J=3#maximum sequence length in file
sequenceSmallVecStart <-
  substring(sequenceString, seq(1, nchar(sequenceString)-J+1, J), 
    seq(J,nchar(sequenceString), J))
sequenceSmallVecEnd <-
    substring(sequenceString, max(seq(J, nchar(sequenceString), J))+1)
sequenceSmallVec <-
    c(sequenceSmallVecStart,sequenceSmallVecEnd)
cat(sequenceSmallVec,sep = "\n")

sequenceString有两种简单的可能性:

s <- "aabbccccdd"
  • strsplit

    regmatches(s, gregexpr(".{2}", s))[[1]]
    # [1] "aa" "bb" "cc" "cc" "dd"
    
    strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
    # [1] "aa" "bb" "cc" "cc" "dd"
    

    strsplit(s),(?很有趣,不知道
    substring
    。更好,因为
    substr
    不接受向量args作为开始/结束。太棒了!第二个版本真的很快!我想知道是否有这样的东西可以拆分“aabbbcdd”"在aa bbb ccccc dd中,我目前使用的是grepexpr。@GSee您可能想重新发布这个问题答案的g2部分,这是:,有什么技巧可以将快速版本扩展到任意块长度
    n
    ?如果可能有奇数个字符,那么在我看来,处理这个问题会更快r事实上,引入
    apply
    循环要比引入
    apply
    循环更快:
    out这些可能性对于建议的
    s
    是等效的,但是如果
    s呢
    
    s <- function(x, n) {
      sst <- strsplit(x, '')[[1]]
      m <- matrix('', nrow=n, ncol=(length(sst)+n-1)%/%n)
      m[seq_along(sst)] <- sst
      apply(m, 2, paste, collapse='')
    }
    
    s('hello world', 2)
    ## [1] "he" "ll" "o " "wo" "rl" "d" 
    s('hello world', 3)
    ## [1] "hel" "lo " "wor" "ld" 
    
    sequenceString <- "ATGAATAAAG"
    
    J=3#maximum sequence length in file
    sequenceSmallVecStart <-
      substring(sequenceString, seq(1, nchar(sequenceString)-J+1, J), 
        seq(J,nchar(sequenceString), J))
    sequenceSmallVecEnd <-
        substring(sequenceString, max(seq(J, nchar(sequenceString), J))+1)
    sequenceSmallVec <-
        c(sequenceSmallVecStart,sequenceSmallVecEnd)
    cat(sequenceSmallVec,sep = "\n")
    
    s <- "aabbccccdd"
    
    regmatches(s, gregexpr(".{2}", s))[[1]]
    # [1] "aa" "bb" "cc" "cc" "dd"
    
    strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
    # [1] "aa" "bb" "cc" "cc" "dd"