R 从第二个“\n”和其他“\n”中删除字符串中的换行符(不要删除第一个)
我有一些序列,我想在R中重新格式化 但R向我展示了返回行的序列,如\n所示: 如果我这样做,它会删除所有的返回R 从第二个“\n”和其他“\n”中删除字符串中的换行符(不要删除第一个),r,regex,macos,R,Regex,Macos,我有一些序列,我想在R中重新格式化 但R向我展示了返回行的序列,如\n所示: 如果我这样做,它会删除所有的返回 gsub(pattern = "\n",replacement = "", x = seqs) 这不起作用: sub("^(.*? \n .*?) \n .*", "\\1", seqs) 这给了我一个错误: gsub(pattern = "${'\n'[*]:0:2}",replacement = "", x = seqs) Error in gsub(pattern = "${
gsub(pattern = "\n",replacement = "", x = seqs)
这不起作用:
sub("^(.*? \n .*?) \n .*", "\\1", seqs)
这给了我一个错误:
gsub(pattern = "${'\n'[*]:0:2}",replacement = "", x = seqs)
Error in gsub(pattern = "${'\n'[*]:0:2}", replacement = "", x = seqs) :
invalid regular expression '${'
'[*]:0:2}', reason 'Invalid contents of {}'
我的序列是可变的:
">Whatever here before \n the sequence start \n the rest \n..."
最终结果将是
">Whatever here before \n the sequence start the rest..."
有趣的是,下面的代码部分适用于测试句子,但不适用于上面的顺序:
seqss = ">Whatever here before \n the sequence start \n the rest \n..."
sub("^(.*? \n .*?) \n .*", "\\1", seqss)
[1] ">Whatever here before \n the sequence start"
试着这样做:
seqs <- ">PRTRE213-13 Volkameria aculeatum matK \n------------------------------------------------------------------CCAAC\nCGAGAGCCAGCTCC------TCTTTTTCAAAA---------CGAAAT---------------------CAA\nAAGACTATTCTTATTCTTATAT------------AATTCTCATGTATGTGAATATGAATCCGTTTTCGTCT\nTTTCTACGTAACCAATCTTTT---CATTTACGATCAACATCTTTTGAAGTTCTTCTTGAACGAATCTATTT\nTCTATGTA---------AAAGTAGAACGTCTT------GTGAACGTCTTTGTTAAGATTAAC---------\n-AATTTTCGGGCGAACCCGTGGTTGGTCAAG------GAACCTTTCATGCATTATATTAGGTATCAAAGAA\nAGATCCATTCTGGCTTCA------AAGGGAACATCTTTTTTCATGAAAAAATGGCAATTTTATCTTGTCAC\nCTTTTTGGCAATGGCATTTTTCGCTGTGGTTTCATCCAAGAAGGATTTATCTAAAC---CAATTATCCAAT\nTTATTCCCTTGAA------TTTTTGGGCTATCTTTCA------AGCGTGCGAATGAACCCCTCTGTGGTAC\nCGGAGTCAAATTCTAGAAAATGCATTTCTAATCAATAATGCTATT------AAGAAGTTTGATACCCTTAT\nTTCCAATTATTCCAATGATTGCGTCATTGGCTAAAGCGAAATTTTGTAACGTATTTGGGCATCCTGTTAGT\nTAAGCCGATTTGGGCTGATTTATCAGATTCTAATATTATTGACCGATTTGGTCGTATA---TGCAGAAATC\nCTTTCTC-------------"
gsub(pattern = "(^.*?\\n)|\\n",replacement = "\\1", x = seqs, perl = TRUE)
就是捕获组中第一个换行符之前的所有内容,以保留并将其放回替换中。一种stringr方法,只需将字符串拆分并将两部分组合在一起
seqs>PRTRE213-13针叶田鼠
> ---------------------------------CCAACCAGCCAGCTCC---TCTTTCAAA----CGAAT----CAAAGCATTTCTATGTGAATAGATGAATGAATTCGTTCGTTTTCTACGATAACCATCTTTT----CATTACCAGATCAATCTTGATTCTTGAACGAATTCATTCATGTA----AAAGTAGAGCATTCTTTTTTATTCATATATGTA----GTGAATCGATTCTTGTGAATGCGTGTGTGAATGTGTGTAGATGTGTCAGTG----GAACTCATTCATATATATATGTTAGTCA---AAGAATCATCATCTGGCTCA---AAGGGAATCTTTCTCTCTCTTTCTCAATTCTTGTCACTTTTTCAATGCATTCTCTCTGTTCAATCCAGAGATTCTACTCAAC---CAATTCCAATTCATTCTCATTCTCTCTATTCA---TTTTTTTTGGGCATCTCTCTCA---AGGTGCGAATCAATCCATTCTGGTAGATTCATCAATGCATTCATTTCAATTCATTCATTTCAATTCATTCATATATTATATAAC---AATCAGATCCATTCAGATTCAGATGCATGCATTCTATTCAATGCATTCTATTCAATTATCTATTCAATTATTCATTCATTCAATTCATTATTCAATTATTCAATATATATTATTCAATATATTATTCAATATATATTATATA---AATCATAGCGATTTGGCTGATTATCTAGATTCTAATTTAGCCGATTGGTCGTATA——TGCAGAATCCTTTCTC-------
由v0.2.0于2018年6月29日创建。我们可以使用gsubfn。在这种情况下,它并不比正则表达式好,但是如果您希望保持前n次出现且n>1,则更容易扩展
不需要任何额外的包。请注意,我认为这种方法依赖于第一条换行线之前的第一个组,它具有一些显著的特征,以防止它也被替换。这是最前面的>。实际上,如果您更改为:^.*?\\n |\\n我想您可以避免此限制谢谢。^已经出现在我给出的原始R示例中,但当我从演示的第一个版本复制解释性正则表达式行时忽略了它。更正。
seqss = ">Whatever here before \n the sequence start \n the rest \n..."
sub("^(.*? \n .*?) \n .*", "\\1", seqss)
[1] ">Whatever here before \n the sequence start"
seqs <- ">PRTRE213-13 Volkameria aculeatum matK \n------------------------------------------------------------------CCAAC\nCGAGAGCCAGCTCC------TCTTTTTCAAAA---------CGAAAT---------------------CAA\nAAGACTATTCTTATTCTTATAT------------AATTCTCATGTATGTGAATATGAATCCGTTTTCGTCT\nTTTCTACGTAACCAATCTTTT---CATTTACGATCAACATCTTTTGAAGTTCTTCTTGAACGAATCTATTT\nTCTATGTA---------AAAGTAGAACGTCTT------GTGAACGTCTTTGTTAAGATTAAC---------\n-AATTTTCGGGCGAACCCGTGGTTGGTCAAG------GAACCTTTCATGCATTATATTAGGTATCAAAGAA\nAGATCCATTCTGGCTTCA------AAGGGAACATCTTTTTTCATGAAAAAATGGCAATTTTATCTTGTCAC\nCTTTTTGGCAATGGCATTTTTCGCTGTGGTTTCATCCAAGAAGGATTTATCTAAAC---CAATTATCCAAT\nTTATTCCCTTGAA------TTTTTGGGCTATCTTTCA------AGCGTGCGAATGAACCCCTCTGTGGTAC\nCGGAGTCAAATTCTAGAAAATGCATTTCTAATCAATAATGCTATT------AAGAAGTTTGATACCCTTAT\nTTCCAATTATTCCAATGATTGCGTCATTGGCTAAAGCGAAATTTTGTAACGTATTTGGGCATCCTGTTAGT\nTAAGCCGATTTGGGCTGATTTATCAGATTCTAATATTATTGACCGATTTGGTCGTATA---TGCAGAAATC\nCTTTCTC-------------"
gsub(pattern = "(^.*?\\n)|\\n",replacement = "\\1", x = seqs, perl = TRUE)
(^.*?\\n)|\\n
library(gsubfn)
p <- proto(fun = function(this, x) if(count > 1) '' else x)
out <- gsubfn('\n', p, seqs)
out == gsub(pattern = "(^.*?\\n)|\\n",replacement = "\\1", x = seqs, perl = TRUE)
#[1] TRUE