Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用stringr::str\u match提取R中的子字符串_R_Regex_Tidyverse_Stringr - Fatal编程技术网

如何使用stringr::str\u match提取R中的子字符串

如何使用stringr::str\u match提取R中的子字符串,r,regex,tidyverse,stringr,R,Regex,Tidyverse,Stringr,我有以下两个字符串: x <- "chr1:625000-635000.BB_162.Adipose" y <- "chr1:625000-635000.BB_162.combined.HMSC-ad" 我要做的是用y来获得这个 [,1] [,2] [,3] [,4] [,5] [,6] [1,] "chr1:625000-635000.BB_162.combined.HM

我有以下两个字符串:

x <- "chr1:625000-635000.BB_162.Adipose"
y <- "chr1:625000-635000.BB_162.combined.HMSC-ad"
我要做的是用
y
来获得这个

     [,1]                                [,2]   [,3]     [,4]     [,5]     [,6]     
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad"  "chr1" "625000" "635000" "BB_162" "HMSC-ad"
使用我当前的正则表达式并申请
y
我得到了以下结果:

   [,1]                                 [,2]   [,3]     [,4]     [,5]     [,6]      
[1,] "chr1:625000-635000.BB_162.combined" "chr1" "625000" "635000" "BB_162" "combined"
如何对正则表达式进行泛化,使其既能处理
x
又能处理
y

更新

S.Kalbar,你的正则表达式给出:

> stringr::str_match(y,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
     [,1]                                         [,2]   [,3]     [,4]     [,5]     [,6]       [,7]     
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "combined" "HMSC-ad"
> stringr::str_match(x,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
     [,1]                                [,2]   [,3]     [,4]     [,5]     [,6]      [,7]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose" NA 
我想要的是这张给
y

                                          [,1]     [,2]   [,3]     [,4]     [,5]     [,6]        
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
这对于
x

                                   [,1]  [,2]   [,3]     [,4]     [,5]     [,6]      
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose" 

Regex
(\w+)(\d+)-(\d+)\(\w+(:\。\w+)(?:\([A-Za-z-]+)


您可以给引擎一些代币,以便拆分:

(?:(?<=\\d)-(?=\\d))|(?:\\.combined\\.)|[.:]+
产生

     [,1]   [,2]     [,3]     [,4]     [,5]     
[1,] "chr1" "625000" "635000" "BB_162" "Adipose"
[2,] "chr1" "625000" "635000" "BB_162" "HMSC-ad"

对于一般正则表达式问题,在.@S.Kalbar上使用您的示例可能会有所帮助。.@S.Kalbar的答案似乎对
x
不正确,它得到
Adipos
而不结束
e
。除此之外,请给出我的作品中提到的R code.@S.Kalbar示例。我希望找到一个能同时处理
x
y
regex。
(?:(?<=\\d)-(?=\\d))  # a dash between numbers
|                     # or
(?:\\.combined\\.)    # .combined. literally
|                     # or
[.:]+                 # one of . or :
library(stringr)

x <- c("chr1:625000-635000.BB_162.Adipose", "chr1:625000-635000.BB_162.combined.HMSC-ad")
str_split(x, '(?:(?<=\\d)-(?=\\d))|(?:\\.combined\\.)|[.:]+', simplify = TRUE)
     [,1]   [,2]     [,3]     [,4]     [,5]     
[1,] "chr1" "625000" "635000" "BB_162" "Adipose"
[2,] "chr1" "625000" "635000" "BB_162" "HMSC-ad"