R 在数据框中循环并替换文本_R

R 在数据框中循环并替换文本

R 在数据框中循环并替换文本,r,R,我有一个数据帧，它由一个包含多个单词的变量组成，例如： variable "hello my name is this" "greetings friend" word "hello" "greetings" 另一个数据帧由两列组成，其中一列是单词，另一列是这些单词的替换项，例如： variable "hello my name is this" "greetings friend" word "hello" "greetings" 替换： replacement "h

我有一个数据帧，它由一个包含多个单词的变量组成，例如：

variable

"hello my name is this"

"greetings friend"

word

"hello"

"greetings"

另一个数据帧由两列组成，其中一列是单词，另一列是这些单词的替换项，例如：

variable

"hello my name is this"

"greetings friend"

word

"hello"

"greetings"

替换：

replacement

"hi"

"hi"

我试图找到一种简单的方法，用替换词替换“variable”中的词，循环遍历所有观察结果和每个观察结果中的所有词。预期的结果是：

variable

"hi my name is this"

"hi friend"

我已经研究了一些使用cSplit的方法，但是对于我的应用程序来说这是不可行的（在任何给定的“variable”观察中都有太多的单词，因此这会创建太多的列）。我不确定我将如何使用strsplit实现这一点，但我猜这是正确的选择

编辑：根据我对这个问题的理解，我的问题可能是以前未回答问题的重复：

stringr

str\u replace\u all

在这种情况下很方便：

df = data.frame(variable = c('hello my name is this','greetings friend'))

replacement <- data.frame(word = c('hello','greetings'), replacment = c('hi','hi'), stringsAsFactors = F)

stringr::str_replace_all(df$variable,replacement$word,replacement$replacment)

这类似于@amrrs的解决方案，但我使用的是命名向量，而不是提供两个单独的向量。这也解决了OP在评论中提到的问题：

library(dplyr)
library(stringr)

df2$word %>%
  paste0("\\b", ., "\\b") %>%
  setNames(df2$replacement, .) %>%
  str_replace_all(df1$variable, .)

# [1] "hi my name is this"        "hi friend"                 "hi, hellomy is not a word"
# [4] "hi! my friend"

这是以regex作为名称和字符串替换为as元素的命名向量：

df2$word %>%
  paste0("\\b", ., "\\b") %>%
  setNames(df2$replacement, .) 
# \\bhello\\b \\bgreetings\\b 
#        "hi"            "hi"

数据：

df1 = data.frame(variable = c('hello my name is this',
                              'greetings friend',
                              'hello, hellomy is not a word',
                              'greetings! my friend'))

df2 = data.frame(word = c('hello','greetings'), 
                 replacement = c('hi','hi'), 
                 stringsAsFactors = F)

注意：

df1 = data.frame(variable = c('hello my name is this',
                              'greetings friend',
                              'hello, hellomy is not a word',
                              'greetings! my friend'))

df2 = data.frame(word = c('hello','greetings'), 
                 replacement = c('hi','hi'), 
                 stringsAsFactors = F)

为了解决根单词也被转换的问题，我用单词边界（

\\b

）包装了正则表达式。这确保了我不会转换存在于另一个单词中的单词，如“helloguys”。

您提到的与您的问题类似的问题确实有答案。事实上，它有两个答案。。。答案不被接受，但这并不意味着他们没有提供好的解决方案。。。你已经试过阅读那些代码了吗？是的，我试过了，这两种代码对我都没有用。我正在测试amrrs提供的以下解决方案，如果这对我有效，我会接受。我喜欢这种方法，但它不会遇到将以“hello”开头的单词替换为“hi”的问题。例如，它可能会将“helloguys”更改为“higuys”？这与上面的示例无关，但与我的实际应用程序相关。如果不应该发生这种情况，则可以在“hello”之后添加一个尾随空格。我希望这样能解决