R 匹配和替换字符向量中的单词_R

R 匹配和替换字符向量中的单词

R 匹配和替换字符向量中的单词,r,R,我有一个包含文本行的向量，如下所示： text<-c("Seat 1: 7e7389e3 ($2 in chips)","Seat 3: 6786517b ($1.67 in chips)","Seat 4: 878b0b52 ($2.16 in chips)","Seat 5: a822375 ($2.37 in chips)","Seat 6: 7a6252e6 ($2.51 in chips)&

我有一个包含文本行的向量，如下所示：

text<-c("Seat 1: 7e7389e3 ($2 in chips)","Seat 3: 6786517b ($1.67 in chips)","Seat 4: 878b0b52 ($2.16 in chips)","Seat 5: a822375 ($2.37 in chips)","Seat 6: 7a6252e6 ($2.51 in chips)")

df<-data.frame(codigo=c("7e7389e3","6786517b","878b0b52","a822375","7a6252e6"),
name=c("lucas","alan","ivan","lucio","donald"))

[1] "Seat 1: lucas ($2 in chips)"
[2] "Seat 3: alan ($1.67 in chips)"
[3] "Seat 4: ivan ($2.16 in chips)"
[4] "Seat 5: lucio ($2.37 in chips)"
[5] "Seat 6: donald ($2.51 in chips)"

有什么公式可以做到这一点吗？

使用

tidyverse

函数尝试这种方法。看起来，如果有一个模式带有

：

和

（

），您可以分配一个公共拆分元素和separate by列，用

df

连接，最后连接字符串以获得预期结果。代码如下：

library(tidyverse)
res <- text %>% as.data.frame %>% setNames(.,'v1') %>%
  mutate(v1=gsub(': ','*',v1),
         v1=gsub(' (','*',v1,fixed=T)) %>%
  separate(v1,c('Var1','codigo','Var3'),sep='\\*') %>%
  left_join(df) %>%
  mutate(Out=paste0(Var1,': ',name,' (',Var3)) %>%
  select(Out)

我们可以通过

str\u replace\u all

轻松实现这一点，它可以获取命名向量

library(stringr)
library(tibble)
str_replace_all(text, deframe(df))
#[1] "Seat 1: lucas ($2 in chips)"  
#[2] "Seat 3: alan ($1.67 in chips)" 
#[3]  "Seat 4: ivan ($2.16 in chips)"  
#[4] "Seat 5: lucio ($2.37 in chips)" 
#[5] "Seat 6: donald ($2.51 in chips)"

使用

sapply

gsub

Vectorize

unname(sapply(text,function(x) (u <- Vectorize(gsub)(df$codigo,df$name,x,fixed = TRUE))[u!=x]))

像这样的案例是使用

for

循环的完美时机。虽然很无聊，但它确实有效，而且根据前面的问题，它在效率方面具有相当的竞争力-

out
unname(sapply(text,function(x) (u <- Vectorize(gsub)(df$codigo,df$name,x,fixed = TRUE))[u!=x]))

[1] "Seat 1: lucas ($2 in chips)"     "Seat 3: alan ($1.67 in chips)"
[3] "Seat 4: ivan ($2.16 in chips)"   "Seat 5: lucio ($2.37 in chips)"
[5] "Seat 6: donald ($2.51 in chips)"

out <- text
for (i in seq_len(nrow(df)) ) {
    out <- gsub(df$codigo[i], df$name[i], out)
}
out
#[1] "Seat 1: lucas ($2 in chips)"     "Seat 3: alan ($1.67 in chips)"  
#[3] "Seat 4: ivan ($2.16 in chips)"   "Seat 5: lucio ($2.37 in chips)" 
#[5] "Seat 6: donald ($2.51 in chips)"