在R库(tm)中,如何获得带有下划线的NGRAMS输出
下面是我从文本数据创建bigram的代码。我得到的输出很好,只是我需要字段名有一个下划线,这样我就可以将它们用作模型的变量在R库(tm)中,如何获得带有下划线的NGRAMS输出,r,nlp,tm,text-analysis,R,Nlp,Tm,Text Analysis,下面是我从文本数据创建bigram的代码。我得到的输出很好,只是我需要字段名有一个下划线,这样我就可以将它们用作模型的变量 text<- c("Since I love to travel, this is what I rely on every time.", "I got the rewards card for the no international transaction fee", "I got the rewards card mainl
text<- c("Since I love to travel, this is what I rely on every time.",
"I got the rewards card for the no international transaction fee",
"I got the rewards card mainly for the flight perks",
"Very good card, easy application process, and no international
transaction fee",
"The customer service is outstanding!",
"My wife got the rewards card for the gift cards and international
transaction fee.She loves it")
df<- data.frame(text)
library(tm)
corpus<- Corpus(DataframeSource(df))
corpus<- tm_map(corpus, content_transformer(tolower))
corpus<- tm_map(corpus, removePunctuation)
corpus<- tm_map(corpus, removeWords, stopwords("english"))
corpus<- tm_map(corpus, stripWhitespace)
BigramTokenizer<-
function(x)
unlist(lapply(ngrams(words(x),2),paste,collapse=" "),use.names=FALSE)
dtm<- DocumentTermMatrix(corpus, control= list(tokenize= BigramTokenizer))
sparse<- removeSparseTerms(dtm,.80)
dtm2<- as.matrix(sparse)
dtm2
我如何使字段名像got\u rewards而不是got rewards我想这不是一个真正的
tm
特定问题。无论如何,您可以在代码中设置collapse=“\u1”
,或在事实发生后修改列名,如下所示:
colnames(dtm2) <- gsub(" ", "_", colnames(dtm2), fixed = TRUE)
dtm2
Terms
Docs got_rewards international_transaction rewards_card transaction_fee
1 0 0 0 0
2 1 1 1 1
3 1 0 1 0
4 0 1 0 1
5 0 0 0 0
6 1 1 1 0
colnames(dtm2)将collapse=”“
更改为collapse=“\u”
?就是这样。。谢谢
colnames(dtm2) <- gsub(" ", "_", colnames(dtm2), fixed = TRUE)
dtm2
Terms
Docs got_rewards international_transaction rewards_card transaction_fee
1 0 0 0 0
2 1 1 1 1
3 1 0 1 0
4 0 1 0 1
5 0 0 0 0
6 1 1 1 0