只删除数字,但保留类似“的单词”;3D";在R?
我最近一直在用R编写文本挖掘代码,但在处理数据预处理时遇到了麻烦。 我有一个如下的字符串:只删除数字,但保留类似“的单词”;3D";在R?,r,tm,R,Tm,我最近一直在用R编写文本挖掘代码,但在处理数据预处理时遇到了麻烦。 我有一个如下的字符串: "I want to buy 3D printer, but it costs 3000 dollars." "I want to buy 3D printer, but it costs dollars." 我想保留单词“3D”,但删除“3000”,应该如下所示: "I want to buy 3D printer, but it costs 3000 dollars." "I want to b
"I want to buy 3D printer, but it costs 3000 dollars."
"I want to buy 3D printer, but it costs dollars."
我想保留单词“3D”,但删除“3000”,应该如下所示:
"I want to buy 3D printer, but it costs 3000 dollars."
"I want to buy 3D printer, but it costs dollars."
我使用
语料库我们可以使用sub
gsub('3\\d+\\s', '', str1)
如果这需要是一般性的
gsub('\\b\\d+\\s', '', str1)
#[1] "I want to buy 3D printer, but it costs dollars."
您还可以使用文本分析包,例如quanteda,它只删除数字,而不删除数字。因此,在你的情况下:
require(quanteda)
tokenize("I want to buy 3D printer, but it costs 3000 dollars.", removeNumbers = TRUE)
## tokenizedText object from 1 document.
## Component 1 :
## [1] "I" "want" "to" "buy" "3D" "printer" "," "but" "it" "costs" "dollars" "."
如果希望将其作为单个字符对象返回,而不进行标记化(尽管这可能是您的目标),那么:
在数字gsub(“\\d+”,“”,x)
Hi!后面找一个空格!如果你能帮我解决这个问题,我将不胜感激:)如果你编辑另一篇文章,明确你希望得到什么样的答案,那么我将尽力帮助你。因为你的问题没有说明这一点,所以你得到了下面的选票。谢谢