R 替换并删除行名称中的部分字符串
我想删除数据框中的一部分行名。我想删除与下面grepl中定义的字符串不匹配的所有内容,并将其替换为后面定义的字符串。有人知道吗R 替换并删除行名称中的部分字符串,r,R,我想删除数据框中的一部分行名。我想删除与下面grepl中定义的字符串不匹配的所有内容,并将其替换为后面定义的字符串。有人知道吗 df[grepl(".*lncRNA.*|.*snRNA.*|.*snoRNA.*|.*precursor_RNA.*", rownames(df))] <- c("lncRNA","snRNA","snoRNA","precursor_RNA") head(rownames(df)) [3208] "URS000075AF9C-snoRNA_GTATGT
df[grepl(".*lncRNA.*|.*snRNA.*|.*snoRNA.*|.*precursor_RNA.*", rownames(df))] <- c("lncRNA","snRNA","snoRNA","precursor_RNA")
head(rownames(df))
[3208] "URS000075AF9C-snoRNA_GTATGTGTGGACAGCACTGAGACTGAGTCT"
[3209] "URS000075B029-snRNA_AACTCTGAGTCTTAAGCTAATTTTTTGAGGCCTTGTTCCGACA"
[3210] "URS000075B029-snRNA_ATTTCCGTGGAGAGGAACAACTCTGAGTCTTAAGCTAATTT"
[3211] "URS000075B0E3-lncRNA_GTAAGGGGCAGTAAG"
[3212] "URS000075B261-precursor_RNA_CTTTCTATGCTCCTGTTCTGC"
[3213] "URS000075B2ED-lncRNA_CACTCAGGACCCACC"
我们可以使用gsub从字符串的开始^处匹配一个或多个非-[^-]+字符,后跟一个或|一个或多个非下划线[^ |]+字符,直到字符串$结束,并用空格替换
如果我们在行名上这样做
gsub("^[^-]+-|_[^_]+$", "", rownames(df))
数据
欢迎来到StackOverflow!你给了我们一些输入和输出的例子,但是请考虑提供一个让我们更容易帮助你的方法。 在您的情况下,我认为您可以使用sub,捕获中间部分,并在替换中使用\1
x <- c("URS000075AF9C-snoRNA_GTATGTGTGGACAGCACTGAGACTGAGTCT",
"URS000075B029-snRNA_AACTCTGAGTCTTAAGCTAATTTTTTGAGGCCTTGTTCCGACA",
"URS000075B029-snRNA_ATTTCCGTGGAGAGGAACAACTCTGAGTCTTAAGCTAATTT",
"URS000075B0E3-lncRNA_GTAAGGGGCAGTAAG",
"URS000075B261-precursor_RNA_CTTTCTATGCTCCTGTTCTGC",
"URS000075B2ED-lncRNA_CACTCAGGACCCACC")
# replace the string with the captured group (ie regex in brackets)
gsub("^.*(lncRNA|snRNA|snoRNA|precursor_RNA).*$", "\\1", x)
# [1] "snoRNA" "snRNA" "snRNA" "lncRNA"
# [5] "precursor_RNA" "lncRNA"
但是,行名必须是唯一的,因此您可能需要将结果存储在数据框的一列中,或者您可以使用make.unique使其唯一,但我认为将结果保存为数据框中的一列会更有意义。为什么为负feedback@csgillespie谢谢我注意到你和另一张海报已经提到了这一点。所以,我想这就足够了。
gsub("^[^-]+-|_[^_]+$", "", rownames(df))
v1 <- c("URS000075AF9C-snoRNA_GTATGTGTGGACAGCACTGAGACTGAGTCT",
"URS000075B029-snRNA_AACTCTGAGTCTTAAGCTAATTTTTTGAGGCCTTGTTCCGACA",
"URS000075B029-snRNA_ATTTCCGTGGAGAGGAACAACTCTGAGTCTTAAGCTAATTT",
"URS000075B0E3-lncRNA_GTAAGGGGCAGTAAG",
"URS000075B261-precursor_RNA_CTTTCTATGCTCCTGTTCTGC",
"URS000075B2ED-lncRNA_CACTCAGGACCCACC")
x <- c("URS000075AF9C-snoRNA_GTATGTGTGGACAGCACTGAGACTGAGTCT",
"URS000075B029-snRNA_AACTCTGAGTCTTAAGCTAATTTTTTGAGGCCTTGTTCCGACA",
"URS000075B029-snRNA_ATTTCCGTGGAGAGGAACAACTCTGAGTCTTAAGCTAATTT",
"URS000075B0E3-lncRNA_GTAAGGGGCAGTAAG",
"URS000075B261-precursor_RNA_CTTTCTATGCTCCTGTTCTGC",
"URS000075B2ED-lncRNA_CACTCAGGACCCACC")
# replace the string with the captured group (ie regex in brackets)
gsub("^.*(lncRNA|snRNA|snoRNA|precursor_RNA).*$", "\\1", x)
# [1] "snoRNA" "snRNA" "snRNA" "lncRNA"
# [5] "precursor_RNA" "lncRNA"