Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/kubernetes/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 标题部分相同的组合字符串_R - Fatal编程技术网

R 标题部分相同的组合字符串

R 标题部分相同的组合字符串,r,R,我有这样一个文件: >mmu-let-7g-5p MIMAT0000121 Mus musculus let-7g-5p UGAGGUAGUAGUUUGUACAGUU >mmu-let-7g-3p MIMAT0004519 Mus musculus let-7g-3p ACUGUACAGGCCACUGCCUUGC >mmu-let-7i-5p MIMAT0000122 Mus musculus let-7i-5p UGAGGUAGUAGUUUGUGCUGUU >mmu-l

我有这样一个文件:

>mmu-let-7g-5p MIMAT0000121 Mus musculus let-7g-5p
UGAGGUAGUAGUUUGUACAGUU
>mmu-let-7g-3p MIMAT0004519 Mus musculus let-7g-3p
ACUGUACAGGCCACUGCCUUGC
>mmu-let-7i-5p MIMAT0000122 Mus musculus let-7i-5p
UGAGGUAGUAGUUUGUGCUGUU
>mmu-let-7i-3p MIMAT0004520 Mus musculus let-7i-3p
CUGCGCAAGCUACUGCCUUGCU
....
....
我想根据标题的这一部分组合具有相同标题的字符串
mmu-let-7g、mmu-let-7i等

输出:

>mmu-let-7g
UGAGGUAGUAGUUUGUACAGUU ACUGUACAGGCCACUGCCUUGC
>mmu-let-7i
UGAGGUAGUAGUUUGUGCUGUU CUGCGCAAGCUACUGCCUUGCU

您可以使用
readLines
读取文件,删除以“-”(“lines1”)开头的“lines”的后缀部分。这只会删除标题行的后缀。创建一个TRUE/FALSE的索引('indx')。将标题行与基线分开,使用按“标题”分组的聚合函数(
tapply
)并将基线粘贴在一起。将“v1”重新排列为“v2”,可以得到预期的结果


第一步是读取多行格式。如果您还提供一个列表作为what参数(并且您使用一个命名列表),则
scan
函数允许此操作。这适用于转换为数据帧:

> dat <- as.data.frame( scan(what =list( V1="", V2="", V3="", V4="", V5="", V6=""), multi.line=TRUE)  )
1: >mmu-let-7g-5p MIMAT0000121 Mus musculus let-7g-5p
1: UGAGGUAGUAGUUUGUACAGUU
2: >mmu-let-7g-3p MIMAT0004519 Mus musculus let-7g-3p
2: ACUGUACAGGCCACUGCCUUGC
3: >mmu-let-7i-5p MIMAT0000122 Mus musculus let-7i-5p
3: UGAGGUAGUAGUUUGUGCUGUU
4: >mmu-let-7i-3p MIMAT0004520 Mus musculus let-7i-3p
4: CUGCGCAAGCUACUGCCUUGCU
5: 
Read 4 records
lines2 <-  unlist(lapply(split(lines1, cumsum(grepl('>', lines1))),
         function(x) c(x[1],paste(x[-1], collapse=''))), 
                          use.names=FALSE)
v1 <- tapply(lines2[!indx], lines2[indx], FUN=paste, collapse=' ')
v2 <- c(rbind(names(v1), unname(v1)))
v2
#[1] ">mmu-let-7g"                                  
#[2] "UGAGGUAGUAGUUUGUACAGUU ACUGUACAGGCCACUGCCUUGC"
#[3] ">mmu-let-7i"                                  
#[4] "UGAGGUAGUAGUUUGUGCUGUU CUGCGCAAGCUACUGCCUUGCU"
> dat <- as.data.frame( scan(what =list( V1="", V2="", V3="", V4="", V5="", V6=""), multi.line=TRUE)  )
1: >mmu-let-7g-5p MIMAT0000121 Mus musculus let-7g-5p
1: UGAGGUAGUAGUUUGUACAGUU
2: >mmu-let-7g-3p MIMAT0004519 Mus musculus let-7g-3p
2: ACUGUACAGGCCACUGCCUUGC
3: >mmu-let-7i-5p MIMAT0000122 Mus musculus let-7i-5p
3: UGAGGUAGUAGUUUGUGCUGUU
4: >mmu-let-7i-3p MIMAT0004520 Mus musculus let-7i-3p
4: CUGCGCAAGCUACUGCCUUGCU
5: 
Read 4 records
> tapply(dat$V6, sub("-..$","", dat$V5), paste, collapse=" ")
                                         let-7g 
"UGAGGUAGUAGUUUGUACAGUU ACUGUACAGGCCACUGCCUUGC" 
                                         let-7i 
"UGAGGUAGUAGUUUGUGCUGUU CUGCGCAAGCUACUGCCUUGCU"