Regex 在R中解析和重新排列文本数据以添加选项卡_Regex_R_Replace

Regex 在R中解析和重新排列文本数据以添加选项卡

regex r replace

Regex 在R中解析和重新排列文本数据以添加选项卡,regex,r,replace,Regex,R,Replace,我有一个文本数据mydata.txt，我想重新排列它 mydata.txt： 241623..243414 product="Putative sulfate permease" complement(344599..354507) product="Alcohol dehydrogenase (EC 1.1.1.1)" tRNA 168479..169551 product="tRNA-Val-GAC" 我的意图是移动每一行的一部分，从product=。。通过一个选项卡\t将文本的第一部分进

我有一个文本数据mydata.txt，我想重新排列它

mydata.txt：

241623..243414 product="Putative sulfate permease"
complement(344599..354507) product="Alcohol dehydrogenase (EC 1.1.1.1)"
tRNA 168479..169551 product="tRNA-Val-GAC"

我的意图是移动每一行的一部分，从product=。。通过一个选项卡\t将文本的第一部分进一步隔开，如下所示：

241623..243414  product="Putative sulfate permease"
complement(344599..354507)  product="Alcohol dehydrogenase (EC 1.1.1.1)"
tRNA 168479..169551 product="tRNA-Val-GAC"

我迄今为止的努力：

x <- sub("(^\\.)(\\product=\\S+)$","\\1", mydata)
y <- sub("(^\\.)(\\product=\\S+)$","\\2", mydata)

在每种情况下，我得到的只是一些数值作为输出。有人能帮我吗？谢谢。

如果您想对每种产品都这样做=您可以

library(stringr)
x <- '241623..243414  product="Putative sulfate permease"
      complement(344599..354507)  product="Alcohol dehydrogenase (EC 1.1.1.1)"
      tRNA 168479..169551 product="tRNA-Val-GAC"'
str_replace_all(x, "product=", "\tproduct=")

如果你想对每种产品都这样做，你可以

library(stringr)
x <- '241623..243414  product="Putative sulfate permease"
      complement(344599..354507)  product="Alcohol dehydrogenase (EC 1.1.1.1)"
      tRNA 168479..169551 product="tRNA-Val-GAC"'
str_replace_all(x, "product=", "\tproduct=")

制作一些与您的示例对应的测试数据：

test <- c(
  "241623..243414 product=\"Putative sulfate permease\"",
  "complement(344599..354507) product=\"Alcohol dehydrogenase (EC 1.1.1.1)\"",
  "tRNA 168479..169551 product=\"tRNA-Val-GAC\""
)

结果:

> cat(result[1])
241623..243414  product="Putative sulfate permease"> 

> cat(result[2])
complement(344599..354507)      product="Alcohol dehydrogenase (EC 1.1.1.1)"> 

> cat(result[3])
tRNA 168479..169551     product="tRNA-Val-GAC">

制作一些与您的示例对应的测试数据：

test <- c(
  "241623..243414 product=\"Putative sulfate permease\"",
  "complement(344599..354507) product=\"Alcohol dehydrogenase (EC 1.1.1.1)\"",
  "tRNA 168479..169551 product=\"tRNA-Val-GAC\""
)

结果:

> cat(result[1])
241623..243414  product="Putative sulfate permease"> 

> cat(result[2])
complement(344599..354507)      product="Alcohol dehydrogenase (EC 1.1.1.1)"> 

> cat(result[3])
tRNA 168479..169551     product="tRNA-Val-GAC">

嗯。你的正则表达式应该匹配什么？嗯。你的正则表达式应该匹配什么？嗨，伙计们，谢谢你们的代码。我忘了提到我有数千行这样的文本，我想从文件中读取。为此，我使用了read.delim，但这两个代码在这种情况下都不起作用。有什么出路吗？嗨，伙计们，谢谢你们的密码。我忘了提到我有数千行这样的文本，我想从文件中读取。为此，我使用了read.delim，但这两个代码在这种情况下都不起作用。有什么出路吗？