String 如何将字符串拆分为给定长度的子字符串?
我有一个字符串,例如:String 如何将字符串拆分为给定长度的子字符串?,string,r,split,String,R,Split,我有一个字符串,例如: “aabbccdd” 我想把这个字符串分解成长度为2的子字符串向量: “aa”“bb”“cc”“cc”“dd”这里有一种方法 substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2)) #[1] "aa" "bb" "cc" "cc" "dd" 或者更一般地说 text <- "aabbccccdd" substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text
“aabbccdd”
我想把这个字符串分解成长度为2的子字符串向量:
“aa”“bb”“cc”“cc”“dd”
这里有一种方法
substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2))
#[1] "aa" "bb" "cc" "cc" "dd"
或者更一般地说
text <- "aabbccccdd"
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
#[1] "aa" "bb" "cc" "cc" "dd"
text可以使用矩阵对字符进行分组:
s2 <- function(x) {
m <- matrix(strsplit(x, '')[[1]], nrow=2)
apply(m, 2, paste, collapse='')
}
s2('aabbccddeeff')
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
更不幸的是,@GSee的g1
和g2
对于奇数字符串长度的输入,会自动返回错误的结果:
g1('abc')
## [1] "ab"
g2('abc')
## [1] "ab" "cb"
这是s2精神下的函数,为每组中的字符数取一个参数,并在必要时保留最后一个条目的长度:
s <- function(x, n) {
sst <- strsplit(x, '')[[1]]
m <- matrix('', nrow=n, ncol=(length(sst)+n-1)%/%n)
m[seq_along(sst)] <- sst
apply(m, 2, paste, collapse='')
}
s('hello world', 2)
## [1] "he" "ll" "o " "wo" "rl" "d"
s('hello world', 3)
## [1] "hel" "lo " "wor" "ld"
s丑陋但有效
sequenceString <- "ATGAATAAAG"
J=3#maximum sequence length in file
sequenceSmallVecStart <-
substring(sequenceString, seq(1, nchar(sequenceString)-J+1, J),
seq(J,nchar(sequenceString), J))
sequenceSmallVecEnd <-
substring(sequenceString, max(seq(J, nchar(sequenceString), J))+1)
sequenceSmallVec <-
c(sequenceSmallVecStart,sequenceSmallVecEnd)
cat(sequenceSmallVec,sep = "\n")
sequenceString有两种简单的可能性:
s <- "aabbccccdd"
strsplit
:
regmatches(s, gregexpr(".{2}", s))[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"
strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"
strsplit(s),(?很有趣,不知道substring
。更好,因为substr
不接受向量args作为开始/结束。太棒了!第二个版本真的很快!我想知道是否有这样的东西可以拆分“aabbbcdd”"在aa bbb ccccc dd中,我目前使用的是grepexpr。@GSee您可能想重新发布这个问题答案的g2部分,这是:,有什么技巧可以将快速版本扩展到任意块长度n
?如果可能有奇数个字符,那么在我看来,处理这个问题会更快r事实上,引入apply
循环要比引入apply
循环更快:out这些可能性对于建议的s
是等效的,但是如果s呢
s <- function(x, n) {
sst <- strsplit(x, '')[[1]]
m <- matrix('', nrow=n, ncol=(length(sst)+n-1)%/%n)
m[seq_along(sst)] <- sst
apply(m, 2, paste, collapse='')
}
s('hello world', 2)
## [1] "he" "ll" "o " "wo" "rl" "d"
s('hello world', 3)
## [1] "hel" "lo " "wor" "ld"
sequenceString <- "ATGAATAAAG"
J=3#maximum sequence length in file
sequenceSmallVecStart <-
substring(sequenceString, seq(1, nchar(sequenceString)-J+1, J),
seq(J,nchar(sequenceString), J))
sequenceSmallVecEnd <-
substring(sequenceString, max(seq(J, nchar(sequenceString), J))+1)
sequenceSmallVec <-
c(sequenceSmallVecStart,sequenceSmallVecEnd)
cat(sequenceSmallVec,sep = "\n")
s <- "aabbccccdd"
regmatches(s, gregexpr(".{2}", s))[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"
strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"