R 从字符串中删除特定短语

R 从字符串中删除特定短语,r,gsub,R,Gsub,我试图利用R进行一些基本的文本分析 我有一个包含复杂数据类型的列。我希望维护一个单独的表,可以使用该表从第一个数据列中删除某些短语 我尝试过gsubfn,但没有成功 比如说 dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") removefields <-c("COURT","BODY CORPORATE") 试试这个 dirtydata <- c("JOHN COURT","@PETE

我试图利用R进行一些基本的文本分析

我有一个包含复杂数据类型的列。我希望维护一个单独的表,可以使用该表从第一个数据列中删除某些短语

我尝试过gsubfn,但没有成功

比如说

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
试试这个

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT | BODY CORPORATE")
x <- gsub(removefields, "", dirtydata)

dirtydata这概括了您在
removefields
中输入的内容,并在要删除的字符串周围去除空格:

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <- c("COURT","BODY CORPORATE")
removefields <- paste0("\\s+", removefields, "\\s+", collapse = "|")
x <- gsub(removefields, "", dirtydata)

dirtydata我们可以使用
tm
package

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")

library(tm)
removeWords(dirtydata, removefields)

> removeWords(dirtydata, removefields)
[1] "JOHN "   "@PETER"  "BOB 22"  "RUPERT "

dirtydata请使用R的
base
函数查找下面编辑的代码

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
pastedFields = paste0(removefields,collapse = "|")
gsub(pastedFields,"",dirtydata)

dirtydata请包含其他加载包的名称。但是您可以尝试
gsub(粘贴(removefields,collapse=“|”),“,dirtydata)
可能的重复项,或者您可以对其进行详细说明吗?我假设你们得到的是列表格式的输出,除了向量?如果是这样的话,请将您应用代码的代码行放在您的数据列上
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")

library(tm)
removeWords(dirtydata, removefields)

> removeWords(dirtydata, removefields)
[1] "JOHN "   "@PETER"  "BOB 22"  "RUPERT "
dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
pastedFields = paste0(removefields,collapse = "|")
gsub(pastedFields,"",dirtydata)