在R中使用正则表达式拆分字符串_R_Regex_String

在R中使用正则表达式拆分字符串

r regex string

在R中使用正则表达式拆分字符串,r,regex,string,R,Regex,String,我有下面一个非常长的字符串列表，看起来像下面，我想把它分成几段 strings<-c("https://www.website.com/stats/stat.227.y2020.eon.t879.html", "https://www.website.com/stats/stat.229.y2019.eoff.t476.html") 如何使用regex实现这一点？使用str\u match： stringr::str_match(strings,

我有下面一个非常长的字符串列表，看起来像下面，我想把它分成几段

strings<-c("https://www.website.com/stats/stat.227.y2020.eon.t879.html",
"https://www.website.com/stats/stat.229.y2019.eoff.t476.html")

如何使用regex实现这一点？

使用

str\u match

：

stringr::str_match(strings, '.*\\.(y\\d+)\\.(\\w+)\\.(t\\d+)')

如果在数据帧中放入字符串，则可以在

tidyr:：extract

中使用相同的正则表达式

tidyr::extract(data.frame(strings), strings, c("Year","Seas", "Tour"), 
              '\\.(y\\d+)\\.(\\w+)\\.(t\\d+)', remove = FALSE)

#                                                      strings  Year Seas Tour
#1  https://www.pgatour.com/stats/stat.227.y2020.eon.t879.html y2020  eon t879
#2 https://www.pgatour.com/stats/stat.229.y2019.eoff.t476.html y2019 eoff t476

在这里，我们将数据分为3部分（捕获组）

第一部分-

'y'

后跟一个数字

第二部分-第1部分后的下一个单词

第三部分

't'

后跟一个数字。

您可以使用{unglue}：

library（脱胶）
unglue：：unglue_数据(
字符串“{links}.{Year=[^.]+}.{Seas=[^.]+}.{Tour=[^.]+}.html”）
#>连年海上旅游
#> 1 https://www.website.com/stats/stat.227 y2020 eon t879
#> 2 https://www.website.com/stats/stat.229 y2019 eoff t476

这里的

“[^.]+”

表示“一个或多个非点字符”，这是我们想要的年份、海洋和旅游

tidyr::extract(data.frame(strings), strings, c("Year","Seas", "Tour"), 
              '\\.(y\\d+)\\.(\\w+)\\.(t\\d+)', remove = FALSE)

#                                                      strings  Year Seas Tour
#1  https://www.pgatour.com/stats/stat.227.y2020.eon.t879.html y2020  eon t879
#2 https://www.pgatour.com/stats/stat.229.y2019.eoff.t476.html y2019 eoff t476