R 匹配和替换字符向量中的单词

R 匹配和替换字符向量中的单词,r,R,我有一个包含文本行的向量,如下所示: text<-c("Seat 1: 7e7389e3 ($2 in chips)","Seat 3: 6786517b ($1.67 in chips)","Seat 4: 878b0b52 ($2.16 in chips)","Seat 5: a822375 ($2.37 in chips)","Seat 6: 7a6252e6 ($2.51 in chips)&

我有一个包含文本行的向量,如下所示:

text<-c("Seat 1: 7e7389e3 ($2 in chips)","Seat 3: 6786517b ($1.67 in chips)","Seat 4: 878b0b52 ($2.16 in chips)","Seat 5: a822375 ($2.37 in chips)","Seat 6: 7a6252e6 ($2.51 in chips)")
df<-data.frame(codigo=c("7e7389e3","6786517b","878b0b52","a822375","7a6252e6"),
name=c("lucas","alan","ivan","lucio","donald"))
[1] "Seat 1: lucas ($2 in chips)"
[2] "Seat 3: alan ($1.67 in chips)"
[3] "Seat 4: ivan ($2.16 in chips)"
[4] "Seat 5: lucio ($2.37 in chips)"
[5] "Seat 6: donald ($2.51 in chips)"

有什么公式可以做到这一点吗?

使用
tidyverse
函数尝试这种方法。看起来,如果有一个模式带有
),您可以分配一个公共拆分元素和separate by列,用
df
连接,最后连接字符串以获得预期结果。代码如下:

library(tidyverse)
res <- text %>% as.data.frame %>% setNames(.,'v1') %>%
  mutate(v1=gsub(': ','*',v1),
         v1=gsub(' (','*',v1,fixed=T)) %>%
  separate(v1,c('Var1','codigo','Var3'),sep='\\*') %>%
  left_join(df) %>%
  mutate(Out=paste0(Var1,': ',name,' (',Var3)) %>%
  select(Out)

我们可以通过
str\u replace\u all
轻松实现这一点,它可以获取命名向量

library(stringr)
library(tibble)
str_replace_all(text, deframe(df))
#[1] "Seat 1: lucas ($2 in chips)"  
#[2] "Seat 3: alan ($1.67 in chips)" 
#[3]  "Seat 4: ivan ($2.16 in chips)"  
#[4] "Seat 5: lucio ($2.37 in chips)" 
#[5] "Seat 6: donald ($2.51 in chips)"

使用
sapply
+
gsub
+
Vectorize

unname(sapply(text,function(x) (u <- Vectorize(gsub)(df$codigo,df$name,x,fixed = TRUE))[u!=x]))

像这样的案例是使用
for
循环的完美时机。虽然很无聊,但它确实有效,而且根据前面的问题,它在效率方面具有相当的竞争力-

out
unname(sapply(text,function(x) (u <- Vectorize(gsub)(df$codigo,df$name,x,fixed = TRUE))[u!=x]))
[1] "Seat 1: lucas ($2 in chips)"     "Seat 3: alan ($1.67 in chips)"
[3] "Seat 4: ivan ($2.16 in chips)"   "Seat 5: lucio ($2.37 in chips)"
[5] "Seat 6: donald ($2.51 in chips)"
out <- text
for (i in seq_len(nrow(df)) ) {
    out <- gsub(df$codigo[i], df$name[i], out)
}
out
#[1] "Seat 1: lucas ($2 in chips)"     "Seat 3: alan ($1.67 in chips)"  
#[3] "Seat 4: ivan ($2.16 in chips)"   "Seat 5: lucio ($2.37 in chips)" 
#[5] "Seat 6: donald ($2.51 in chips)"