Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 从第二个“\n”和其他“\n”中删除字符串中的换行符(不要删除第一个)_R_Regex_Macos - Fatal编程技术网

R 从第二个“\n”和其他“\n”中删除字符串中的换行符(不要删除第一个)

R 从第二个“\n”和其他“\n”中删除字符串中的换行符(不要删除第一个),r,regex,macos,R,Regex,Macos,我有一些序列,我想在R中重新格式化 但R向我展示了返回行的序列,如\n所示: 如果我这样做,它会删除所有的返回 gsub(pattern = "\n",replacement = "", x = seqs) 这不起作用: sub("^(.*? \n .*?) \n .*", "\\1", seqs) 这给了我一个错误: gsub(pattern = "${'\n'[*]:0:2}",replacement = "", x = seqs) Error in gsub(pattern = "${

我有一些序列,我想在R中重新格式化

但R向我展示了返回行的序列,如\n所示:

如果我这样做,它会删除所有的返回

gsub(pattern = "\n",replacement = "", x = seqs)
这不起作用:

sub("^(.*? \n .*?) \n .*", "\\1", seqs)
这给了我一个错误:

gsub(pattern = "${'\n'[*]:0:2}",replacement = "", x = seqs)
Error in gsub(pattern = "${'\n'[*]:0:2}", replacement = "", x = seqs) : 
  invalid regular expression '${'
'[*]:0:2}', reason 'Invalid contents of {}'
我的序列是可变的:

">Whatever here before \n the sequence start \n the rest \n..."
最终结果将是

">Whatever here before \n the sequence start the rest..."
有趣的是,下面的代码部分适用于测试句子,但不适用于上面的顺序:

seqss = ">Whatever here before \n the sequence start \n the rest \n..."
sub("^(.*? \n .*?) \n .*", "\\1", seqss)
[1] ">Whatever here before \n the sequence start"
试着这样做:

seqs <- ">PRTRE213-13 Volkameria aculeatum matK \n------------------------------------------------------------------CCAAC\nCGAGAGCCAGCTCC------TCTTTTTCAAAA---------CGAAAT---------------------CAA\nAAGACTATTCTTATTCTTATAT------------AATTCTCATGTATGTGAATATGAATCCGTTTTCGTCT\nTTTCTACGTAACCAATCTTTT---CATTTACGATCAACATCTTTTGAAGTTCTTCTTGAACGAATCTATTT\nTCTATGTA---------AAAGTAGAACGTCTT------GTGAACGTCTTTGTTAAGATTAAC---------\n-AATTTTCGGGCGAACCCGTGGTTGGTCAAG------GAACCTTTCATGCATTATATTAGGTATCAAAGAA\nAGATCCATTCTGGCTTCA------AAGGGAACATCTTTTTTCATGAAAAAATGGCAATTTTATCTTGTCAC\nCTTTTTGGCAATGGCATTTTTCGCTGTGGTTTCATCCAAGAAGGATTTATCTAAAC---CAATTATCCAAT\nTTATTCCCTTGAA------TTTTTGGGCTATCTTTCA------AGCGTGCGAATGAACCCCTCTGTGGTAC\nCGGAGTCAAATTCTAGAAAATGCATTTCTAATCAATAATGCTATT------AAGAAGTTTGATACCCTTAT\nTTCCAATTATTCCAATGATTGCGTCATTGGCTAAAGCGAAATTTTGTAACGTATTTGGGCATCCTGTTAGT\nTAAGCCGATTTGGGCTGATTTATCAGATTCTAATATTATTGACCGATTTGGTCGTATA---TGCAGAAATC\nCTTTCTC-------------"
gsub(pattern = "(^.*?\\n)|\\n",replacement = "\\1", x = seqs, perl = TRUE)
就是捕获组中第一个换行符之前的所有内容,以保留并将其放回替换中。

一种stringr方法,只需将字符串拆分并将两部分组合在一起

seqs>PRTRE213-13针叶田鼠 > ---------------------------------CCAACCAGCCAGCTCC---TCTTTCAAA----CGAAT----CAAAGCATTTCTATGTGAATAGATGAATGAATTCGTTCGTTTTCTACGATAACCATCTTTT----CATTACCAGATCAATCTTGATTCTTGAACGAATTCATTCATGTA----AAAGTAGAGCATTCTTTTTTATTCATATATGTA----GTGAATCGATTCTTGTGAATGCGTGTGTGAATGTGTGTAGATGTGTCAGTG----GAACTCATTCATATATATATGTTAGTCA---AAGAATCATCATCTGGCTCA---AAGGGAATCTTTCTCTCTCTTTCTCAATTCTTGTCACTTTTTCAATGCATTCTCTCTGTTCAATCCAGAGATTCTACTCAAC---CAATTCCAATTCATTCTCATTCTCTCTATTCA---TTTTTTTTGGGCATCTCTCTCA---AGGTGCGAATCAATCCATTCTGGTAGATTCATCAATGCATTCATTTCAATTCATTCATTTCAATTCATTCATATATTATATAAC---AATCAGATCCATTCAGATTCAGATGCATGCATTCTATTCAATGCATTCTATTCAATTATCTATTCAATTATTCATTCATTCAATTCATTATTCAATTATTCAATATATATTATTCAATATATTATTCAATATATATTATATA---AATCATAGCGATTTGGCTGATTATCTAGATTCTAATTTAGCCGATTGGTCGTATA——TGCAGAATCCTTTCTC------- 由v0.2.0于2018年6月29日创建。

我们可以使用gsubfn。在这种情况下,它并不比正则表达式好,但是如果您希望保持前n次出现且n>1,则更容易扩展


不需要任何额外的包。请注意,我认为这种方法依赖于第一条换行线之前的第一个组,它具有一些显著的特征,以防止它也被替换。这是最前面的>。实际上,如果您更改为:^.*?\\n |\\n我想您可以避免此限制谢谢。^已经出现在我给出的原始R示例中,但当我从演示的第一个版本复制解释性正则表达式行时忽略了它。更正。
seqss = ">Whatever here before \n the sequence start \n the rest \n..."
sub("^(.*? \n .*?) \n .*", "\\1", seqss)
[1] ">Whatever here before \n the sequence start"
seqs <- ">PRTRE213-13 Volkameria aculeatum matK \n------------------------------------------------------------------CCAAC\nCGAGAGCCAGCTCC------TCTTTTTCAAAA---------CGAAAT---------------------CAA\nAAGACTATTCTTATTCTTATAT------------AATTCTCATGTATGTGAATATGAATCCGTTTTCGTCT\nTTTCTACGTAACCAATCTTTT---CATTTACGATCAACATCTTTTGAAGTTCTTCTTGAACGAATCTATTT\nTCTATGTA---------AAAGTAGAACGTCTT------GTGAACGTCTTTGTTAAGATTAAC---------\n-AATTTTCGGGCGAACCCGTGGTTGGTCAAG------GAACCTTTCATGCATTATATTAGGTATCAAAGAA\nAGATCCATTCTGGCTTCA------AAGGGAACATCTTTTTTCATGAAAAAATGGCAATTTTATCTTGTCAC\nCTTTTTGGCAATGGCATTTTTCGCTGTGGTTTCATCCAAGAAGGATTTATCTAAAC---CAATTATCCAAT\nTTATTCCCTTGAA------TTTTTGGGCTATCTTTCA------AGCGTGCGAATGAACCCCTCTGTGGTAC\nCGGAGTCAAATTCTAGAAAATGCATTTCTAATCAATAATGCTATT------AAGAAGTTTGATACCCTTAT\nTTCCAATTATTCCAATGATTGCGTCATTGGCTAAAGCGAAATTTTGTAACGTATTTGGGCATCCTGTTAGT\nTAAGCCGATTTGGGCTGATTTATCAGATTCTAATATTATTGACCGATTTGGTCGTATA---TGCAGAAATC\nCTTTCTC-------------"
gsub(pattern = "(^.*?\\n)|\\n",replacement = "\\1", x = seqs, perl = TRUE)
(^.*?\\n)|\\n
library(gsubfn)
p <- proto(fun = function(this, x) if(count > 1) '' else x)
out <- gsubfn('\n', p, seqs)
out == gsub(pattern = "(^.*?\\n)|\\n",replacement = "\\1", x = seqs, perl = TRUE)
#[1] TRUE