String 如何用R解析字符串(通过“new”标记)?
我想使用R来进行字符串解析,这(我认为)类似于简单的HTML解析 例如,假设我们有以下两个变量:String 如何用R解析字符串(通过“new”标记)?,string,parsing,r,loops,String,Parsing,R,Loops,我想使用R来进行字符串解析,这(我认为)类似于简单的HTML解析 例如,假设我们有以下两个变量: Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA" Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<&l
Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA"
Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<."
我没有任何编写解析器的经验,我想知道在编写类似的东西(以及任何推荐使用的R命令)时应该使用什么策略
我想的是首先去掉“Stem 0”,然后使用递归函数(我们称之为“separate.Stem”)遍历内部字符串,每次将字符串拆分为:
1.茎前
2.开口杆
3.内茎
4.闭杆
5.茎后
其中“后干”将递归地输入同一个函数(“separate.stem”)
问题是,我不知道如何尝试在不使用循环的情况下进行编码
任何建议都将受到欢迎
更新:有人给我发了一大堆问题,在这里
Q:每个序列的开头序列的“>>>>”数量是否与“相同?您可以使用游程编码简化任务 首先,将
Str
转换为单个字符的向量,然后调用rle
split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)
Run Length Encoding
lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."
split\u Str是否存在伪结()?如>>>..>>>>>>>>>>>。。。
list(
"Stem 0 opening" = "GCCTCGA",
"before Stem 1" = "TA",
"Stem 1" = list(opening = "GCTC",
inside = "AGTTGGGA",
closing = "GAGC"
),
"between Stem 1 and 2" = "G",
"Stem 2" = list(opening = "TACGA",
inside = "CTGAAGA",
closing = "TCGTA"
),
"between Stem 2 and 3" = "AGGtC",
"Stem 3" = list(opening = "ACCAG",
inside = "TTCGATC",
closing = "CTGGT"
),
"After Stem 3" = "",
"Stem 0 closing" = "TCGGGGC"
)
split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)
Run Length Encoding
lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."