R 如果有多个单词,则提取逗号后字符串中的最后一个单词,而不是第一个单词

R 如果有多个单词,则提取逗号后字符串中的最后一个单词,而不是第一个单词,r,string-matching,stringr,stringi,R,String Matching,Stringr,Stringi,我有以下文字的数据 location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) 您可以尝试sub df$country <- sub('.*,\\s*', '', df$location) df$country #[1] "New Zealand" "USA" "France" stringi

我有以下文字的数据

 location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
 id<- c(1,2,3)
 df<-data.frame(location,id)

您可以尝试
sub

 df$country <- sub('.*,\\s*', '', df$location)
 df$country
 #[1] "New Zealand" "USA"         "France"   

stringi
解决方案:

require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA"         "France"  
require(stringi)

位置
解释[sub]:
来自
df$location
,替换任何字符
,出现次数不限
*
,最多一个逗号,后跟任何数量/类型的空格
\\s
,不带任何内容
'
解释[str_extract]:
来自
df$location
,在以逗号结尾的字符串中提供一个或多个
+
整词
\\b
,而不是
[]
,,直到字符串的结尾
$
。(因此,基本上,请在逗号后提供所有完整的单词)
 df$country <- sub('.*,\\s*', '', df$location)
 df$country
 #[1] "New Zealand" "USA"         "France"   
 library(stringr)
 str_extract(df$location, '\\b[^,]+$')
 #[1] "New Zealand" "USA"         "France"     
require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA"         "France"