如何使用stringr::str\u match提取R中的子字符串
我有以下两个字符串:如何使用stringr::str\u match提取R中的子字符串,r,regex,tidyverse,stringr,R,Regex,Tidyverse,Stringr,我有以下两个字符串: x <- "chr1:625000-635000.BB_162.Adipose" y <- "chr1:625000-635000.BB_162.combined.HMSC-ad" 我要做的是用y来获得这个 [,1] [,2] [,3] [,4] [,5] [,6] [1,] "chr1:625000-635000.BB_162.combined.HM
x <- "chr1:625000-635000.BB_162.Adipose"
y <- "chr1:625000-635000.BB_162.combined.HMSC-ad"
我要做的是用y
来获得这个
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
使用我当前的正则表达式并申请y
我得到了以下结果:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined" "chr1" "625000" "635000" "BB_162" "combined"
如何对正则表达式进行泛化,使其既能处理x
又能处理y
更新
S.Kalbar,你的正则表达式给出:
> stringr::str_match(y,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "combined" "HMSC-ad"
> stringr::str_match(x,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose" NA
我想要的是这张给y
:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
这对于x
:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose"
Regex:
(\w+)(\d+)-(\d+)\(\w+(:\。\w+)(?:\([A-Za-z-]+)
您可以给引擎一些代币,以便拆分:
(?:(?<=\\d)-(?=\\d))|(?:\\.combined\\.)|[.:]+
产生
[,1] [,2] [,3] [,4] [,5]
[1,] "chr1" "625000" "635000" "BB_162" "Adipose"
[2,] "chr1" "625000" "635000" "BB_162" "HMSC-ad"
对于一般正则表达式问题,在.@S.Kalbar上使用您的示例可能会有所帮助。.@S.Kalbar的答案似乎对
x
不正确,它得到Adipos
而不结束e
。除此之外,请给出我的作品中提到的R code.@S.Kalbar示例。我希望找到一个能同时处理x
和y
的regex。
(?:(?<=\\d)-(?=\\d)) # a dash between numbers
| # or
(?:\\.combined\\.) # .combined. literally
| # or
[.:]+ # one of . or :
library(stringr)
x <- c("chr1:625000-635000.BB_162.Adipose", "chr1:625000-635000.BB_162.combined.HMSC-ad")
str_split(x, '(?:(?<=\\d)-(?=\\d))|(?:\\.combined\\.)|[.:]+', simplify = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] "chr1" "625000" "635000" "BB_162" "Adipose"
[2,] "chr1" "625000" "635000" "BB_162" "HMSC-ad"