仅根据R中句子的一部分查找不同的元素
我有一个data.frame,看起来像这样仅根据R中句子的一部分查找不同的元素,r,regex,dplyr,tidyverse,stringr,R,Regex,Dplyr,Tidyverse,Stringr,我有一个data.frame,看起来像这样 name=c("PFLU_00001_gene", "PFLU_00001_mRNA", "PFLU_00001", "PFLU_00002_gene", "PFLU_00002_mRNA", "PFLU_00002", "PFLU_00003_gene", "PFLU_00
name=c("PFLU_00001_gene", "PFLU_00001_mRNA", "PFLU_00001",
"PFLU_00002_gene", "PFLU_00002_mRNA", "PFLU_00002",
"PFLU_00003_gene", "PFLU_00003_mRNA", "PFLU_00003")
type=c("gene", "mRNA","CDS","gene", "mRNA","CDS","gene", "mRNA","NA")
df <- data.frame(name, type)
name type
1 PFLU_00001_gene gene
2 PFLU_00001_mRNA mRNA
3 PFLU_00001 CDS
4 PFLU_00002_gene gene
5 PFLU_00002_mRNA mRNA
6 PFLU_00002 CDS
7 PFLU_00003_gene gene
8 PFLU_00003_mRNA mRNA
9 PFLU_00003 NA
非常感谢您的帮助和指导
致以最良好的祝愿,
LDT我们可以使用
str\u remove
删除字符串末尾($
)的一个或多个非
字符([^.+$
),并使用唯一的+
)指定regex查找((?基本R选项
unique(
transform(
df["name"],
name = gsub("_\\D+$", "", name)
)
)
给予
谢谢@akrun!总是令人惊奇的解决方案。我会尽快接受答案。也感谢您为我提供的解释。我正在了解字符串替换:|
library(dplyr)
library(stringr)
df %>%
transmute(name = str_remove(name, "(?<=[0-9])_[^_]+$")) %>%
distinct(name)
# name
#1 PFLU_00001
#2 PFLU_00002
#3 PFLU_00003
unique(
transform(
df["name"],
name = gsub("_\\D+$", "", name)
)
)
name
1 PFLU_00001
4 PFLU_00002
7 PFLU_00003