R 如果有多个单词，则提取逗号后字符串中的最后一个单词，而不是第一个单词_R_String Matching_Stringr_Stringi

R 如果有多个单词，则提取逗号后字符串中的最后一个单词，而不是第一个单词

R 如果有多个单词，则提取逗号后字符串中的最后一个单词，而不是第一个单词,r,string-matching,stringr,stringi,R,String Matching,Stringr,Stringi,我有以下文字的数据 location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) 您可以尝试sub df$country <- sub('.*,\\s*', '', df$location) df$country #[1] "New Zealand" "USA" "France" stringi

我有以下文字的数据

 location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
 id<- c(1,2,3)
 df<-data.frame(location,id)

您可以尝试

sub

 df$country <- sub('.*,\\s*', '', df$location)
 df$country
 #[1] "New Zealand" "USA"         "France"

stringi

解决方案：

require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA"         "France"

require（stringi）
位置解释[sub]：
来自df$location
，替换任何字符
，出现次数不限*
，最多一个逗号，后跟任何数量/类型的空格\\s
，不带任何内容'
解释[str_extract]：

来自

df$location

，在以逗号结尾的字符串中提供一个或多个

整词

\\b

，而不是

[]

，，直到字符串的结尾

。（因此，基本上，请在逗号后提供所有完整的单词）

 df$country <- sub('.*,\\s*', '', df$location)
 df$country
 #[1] "New Zealand" "USA"         "France"

 library(stringr)
 str_extract(df$location, '\\b[^,]+$')
 #[1] "New Zealand" "USA"         "France"

require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA"         "France"