R 匹配和替换字符向量中的单词
我有一个包含文本行的向量,如下所示:R 匹配和替换字符向量中的单词,r,R,我有一个包含文本行的向量,如下所示: text<-c("Seat 1: 7e7389e3 ($2 in chips)","Seat 3: 6786517b ($1.67 in chips)","Seat 4: 878b0b52 ($2.16 in chips)","Seat 5: a822375 ($2.37 in chips)","Seat 6: 7a6252e6 ($2.51 in chips)&
text<-c("Seat 1: 7e7389e3 ($2 in chips)","Seat 3: 6786517b ($1.67 in chips)","Seat 4: 878b0b52 ($2.16 in chips)","Seat 5: a822375 ($2.37 in chips)","Seat 6: 7a6252e6 ($2.51 in chips)")
df<-data.frame(codigo=c("7e7389e3","6786517b","878b0b52","a822375","7a6252e6"),
name=c("lucas","alan","ivan","lucio","donald"))
[1] "Seat 1: lucas ($2 in chips)"
[2] "Seat 3: alan ($1.67 in chips)"
[3] "Seat 4: ivan ($2.16 in chips)"
[4] "Seat 5: lucio ($2.37 in chips)"
[5] "Seat 6: donald ($2.51 in chips)"
有什么公式可以做到这一点吗?使用
tidyverse
函数尝试这种方法。看起来,如果有一个模式带有:
和(
),您可以分配一个公共拆分元素和separate by列,用df
连接,最后连接字符串以获得预期结果。代码如下:
library(tidyverse)
res <- text %>% as.data.frame %>% setNames(.,'v1') %>%
mutate(v1=gsub(': ','*',v1),
v1=gsub(' (','*',v1,fixed=T)) %>%
separate(v1,c('Var1','codigo','Var3'),sep='\\*') %>%
left_join(df) %>%
mutate(Out=paste0(Var1,': ',name,' (',Var3)) %>%
select(Out)
我们可以通过
str\u replace\u all
轻松实现这一点,它可以获取命名向量
library(stringr)
library(tibble)
str_replace_all(text, deframe(df))
#[1] "Seat 1: lucas ($2 in chips)"
#[2] "Seat 3: alan ($1.67 in chips)"
#[3] "Seat 4: ivan ($2.16 in chips)"
#[4] "Seat 5: lucio ($2.37 in chips)"
#[5] "Seat 6: donald ($2.51 in chips)"
使用
sapply
+gsub
+Vectorize
unname(sapply(text,function(x) (u <- Vectorize(gsub)(df$codigo,df$name,x,fixed = TRUE))[u!=x]))
像这样的案例是使用
for
循环的完美时机。虽然很无聊,但它确实有效,而且根据前面的问题,它在效率方面具有相当的竞争力-
out
unname(sapply(text,function(x) (u <- Vectorize(gsub)(df$codigo,df$name,x,fixed = TRUE))[u!=x]))
[1] "Seat 1: lucas ($2 in chips)" "Seat 3: alan ($1.67 in chips)"
[3] "Seat 4: ivan ($2.16 in chips)" "Seat 5: lucio ($2.37 in chips)"
[5] "Seat 6: donald ($2.51 in chips)"
out <- text
for (i in seq_len(nrow(df)) ) {
out <- gsub(df$codigo[i], df$name[i], out)
}
out
#[1] "Seat 1: lucas ($2 in chips)" "Seat 3: alan ($1.67 in chips)"
#[3] "Seat 4: ivan ($2.16 in chips)" "Seat 5: lucio ($2.37 in chips)"
#[5] "Seat 6: donald ($2.51 in chips)"