R 如果有多个单词,则提取逗号后字符串中的最后一个单词,而不是第一个单词
我有以下文字的数据R 如果有多个单词,则提取逗号后字符串中的最后一个单词,而不是第一个单词,r,string-matching,stringr,stringi,R,String Matching,Stringr,Stringi,我有以下文字的数据 location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) 您可以尝试sub df$country <- sub('.*,\\s*', '', df$location) df$country #[1] "New Zealand" "USA" "France" stringi
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
id<- c(1,2,3)
df<-data.frame(location,id)
您可以尝试
sub
df$country <- sub('.*,\\s*', '', df$location)
df$country
#[1] "New Zealand" "USA" "France"
stringi
解决方案:
require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA" "France"
require(stringi)
位置解释[sub]:
来自df$location
,替换任何字符
,出现次数不限*
,最多一个逗号,后跟任何数量/类型的空格\\s
,不带任何内容'
解释[str_extract]:
来自df$location
,在以逗号结尾的字符串中提供一个或多个+
整词\\b
,而不是[]
,,直到字符串的结尾$
。(因此,基本上,请在逗号后提供所有完整的单词)
df$country <- sub('.*,\\s*', '', df$location)
df$country
#[1] "New Zealand" "USA" "France"
library(stringr)
str_extract(df$location, '\\b[^,]+$')
#[1] "New Zealand" "USA" "France"
require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA" "France"