String 删除R中的标点符号(撇号和字内破折号除外)

String 删除R中的标点符号(撇号和字内破折号除外),string,r,text,String,R,Text,我知道如何分别删除标点符号和保留撇号: gsub( "[^[:alnum:]']", " ", db$text ) 或者如何使用tm软件包保留字内破折号: removePunctuation(db$text, preserve_intra_word_dashes = TRUE) 但我找不到一种方法同时做到这两个方面。例如,如果我原来的句子是: "Interested in energy/the environment/etc.? Congrats to our new e-board!

我知道如何分别删除标点符号和保留撇号:

gsub( "[^[:alnum:]']", " ", db$text )  
或者如何使用tm软件包保留字内破折号:

removePunctuation(db$text, preserve_intra_word_dashes = TRUE)
但我找不到一种方法同时做到这两个方面。例如,如果我原来的句子是:

"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"
我希望是:

"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
当然,会有额外的空格,但我可以稍后删除它们

我将感谢你的帮助。

使用


我喜欢大卫·阿伦伯格的答案。如果您需要其他方法,您可以尝试:

library(qdap)

text <- "Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"

gsub("/", " ",strip(text, char.keep=c("-","/"), apostrophe.remove=F,lower.case=F))
#[1] "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"

clean
来自
qdap
。用于删除转义字符和空格

感谢您提供了简短而漂亮的解决方案。它在我拥有的一小部分tweet上运行得非常好,但是当我在20多万条tweet上运行它时,我得到一个错误:找不到函数“&我不知道这个错误是由什么引起的。我想它来自你代码的另一部分。不,
gsub
是基本的R,不需要额外的包装。你说得很对,谢谢-当然是一个愚蠢的打字错误。不幸的是,这种方法不能区分单词内破折号和单词间破折号。在string:string1I中,我通过两个步骤解决了这个问题:首先在单词破折号之间删除:gsub(“-”,“”,string1,perl=TRUE),然后使用上述解决方案。
library(qdap)

text <- "Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"

gsub("/", " ",strip(text, char.keep=c("-","/"), apostrophe.remove=F,lower.case=F))
#[1] "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
library(gsubfn)
 clean(gsubfn("[[:punct:]]", function(x) ifelse(x=="'","'",ifelse(x=="-","-"," ")),text))
#[1] "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"