只删除数字,但保留类似“的单词”;3D";在R?

只删除数字,但保留类似“的单词”;3D";在R?,r,tm,R,Tm,我最近一直在用R编写文本挖掘代码,但在处理数据预处理时遇到了麻烦。 我有一个如下的字符串: "I want to buy 3D printer, but it costs 3000 dollars." "I want to buy 3D printer, but it costs dollars." 我想保留单词“3D”,但删除“3000”,应该如下所示: "I want to buy 3D printer, but it costs 3000 dollars." "I want to b

我最近一直在用R编写文本挖掘代码,但在处理数据预处理时遇到了麻烦。 我有一个如下的字符串:

"I want to buy 3D printer, but it costs 3000 dollars."
"I want to buy 3D printer, but it costs dollars."
我想保留单词“3D”,但删除“3000”,应该如下所示:

"I want to buy 3D printer, but it costs 3000 dollars."
"I want to buy 3D printer, but it costs dollars."

我使用
语料库我们可以使用
sub

gsub('3\\d+\\s', '', str1)
如果这需要是一般性的

gsub('\\b\\d+\\s', '', str1)
#[1] "I want to buy 3D printer, but it costs dollars."

您还可以使用文本分析包,例如quanteda,它只删除数字,而不删除数字。因此,在你的情况下:

require(quanteda)
tokenize("I want to buy 3D printer, but it costs 3000 dollars.", removeNumbers = TRUE)
## tokenizedText object from 1 document.
## Component 1 :
## [1] "I"       "want"    "to"      "buy"     "3D"      "printer" ","       "but"     "it"      "costs"   "dollars" "."      
如果希望将其作为单个字符对象返回,而不进行标记化(尽管这可能是您的目标),那么:


在数字
gsub(“\\d+”,“”,x)
Hi!后面找一个空格!如果你能帮我解决这个问题,我将不胜感激:)如果你编辑另一篇文章,明确你希望得到什么样的答案,那么我将尽力帮助你。因为你的问题没有说明这一点,所以你得到了下面的选票。谢谢