String 如何用R解析字符串(通过“new”标记)?

String 如何用R解析字符串(通过“new”标记)?,string,parsing,r,loops,String,Parsing,R,Loops,我想使用R来进行字符串解析,这(我认为)类似于简单的HTML解析 例如,假设我们有以下两个变量: Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA" Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<&l

我想使用R来进行字符串解析,这(我认为)类似于简单的HTML解析

例如,假设我们有以下两个变量:

Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA"
Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<."
我没有任何编写解析器的经验,我想知道在编写类似的东西(以及任何推荐使用的R命令)时应该使用什么策略

我想的是首先去掉“Stem 0”,然后使用递归函数(我们称之为“separate.Stem”)遍历内部字符串,每次将字符串拆分为: 1.茎前 2.开口杆 3.内茎 4.闭杆 5.茎后

其中“后干”将递归地输入同一个函数(“separate.stem”)

问题是,我不知道如何尝试在不使用循环的情况下进行编码

任何建议都将受到欢迎

更新:有人给我发了一大堆问题,在这里


Q:每个序列的开头序列的“>>>>”数量是否与“相同?您可以使用游程编码简化任务

首先,将
Str
转换为单个字符的向量,然后调用
rle

split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)

Run Length Encoding
  lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
  values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."
split\u Str是否存在伪结()?如>>>..>>>>>>>>>>>。。。
list(
    "Stem 0 opening" = "GCCTCGA",
    "before Stem 1" = "TA",
    "Stem 1" = list(opening = "GCTC",
                inside = "AGTTGGGA",
                closing = "GAGC"
            ),
    "between Stem 1 and 2" = "G",
    "Stem 2" = list(opening = "TACGA",
                inside = "CTGAAGA",
                closing = "TCGTA"
            ),
    "between Stem 2 and 3" = "AGGtC",
    "Stem 3" = list(opening = "ACCAG",
                inside = "TTCGATC",
                closing = "CTGGT"
            ),
    "After Stem 3" = "",
    "Stem 0 closing" = "TCGGGGC"
)
split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)

Run Length Encoding
  lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
  values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."