在R库（tm）中，如何获得带有下划线的NGRAMS输出_R_Nlp_Tm_Text Analysis

在R库（tm）中，如何获得带有下划线的NGRAMS输出

r nlp

在R库（tm）中，如何获得带有下划线的NGRAMS输出,r,nlp,tm,text-analysis,R,Nlp,Tm,Text Analysis,下面是我从文本数据创建bigram的代码。我得到的输出很好，只是我需要字段名有一个下划线，这样我就可以将它们用作模型的变量 text<- c("Since I love to travel, this is what I rely on every time.", "I got the rewards card for the no international transaction fee", "I got the rewards card mainl

下面是我从文本数据创建bigram的代码。我得到的输出很好，只是我需要字段名有一个下划线，这样我就可以将它们用作模型的变量

text<- c("Since I love to travel, this is what I rely on every time.", 
        "I got the rewards card for the no international transaction fee", 
        "I got the rewards card mainly for the flight perks",
        "Very good card, easy application process, and no international 
transaction fee",
        "The customer service is outstanding!",
        "My wife got the rewards card for the gift cards and international 
transaction fee.She loves it") 
df<- data.frame(text) 


library(tm)
corpus<- Corpus(DataframeSource(df))
corpus<- tm_map(corpus, content_transformer(tolower))
corpus<- tm_map(corpus, removePunctuation)
corpus<- tm_map(corpus, removeWords, stopwords("english"))
corpus<- tm_map(corpus, stripWhitespace)


BigramTokenizer<-
  function(x)
    unlist(lapply(ngrams(words(x),2),paste,collapse=" "),use.names=FALSE)

dtm<- DocumentTermMatrix(corpus, control= list(tokenize= BigramTokenizer))

sparse<- removeSparseTerms(dtm,.80)
dtm2<- as.matrix(sparse)
dtm2

我如何使字段名像got\u rewards而不是got rewards
我想这不是一个真正的
tm
特定问题。无论如何，您可以在代码中设置
collapse=“\u1”
，或在事实发生后修改列名，如下所示：

colnames(dtm2) <- gsub(" ", "_", colnames(dtm2), fixed = TRUE) dtm2 Terms Docs got_rewards international_transaction rewards_card transaction_fee 1 0 0 0 0 2 1 1 1 1 3 1 0 1 0 4 0 1 0 1 5 0 0 0 0 6 1 1 1 0

colnames（dtm2）将collapse=”“ 更改为collapse=“\u”？就是这样。。谢谢 colnames(dtm2) <- gsub(" ", "_", colnames(dtm2), fixed = TRUE) dtm2 Terms Docs got_rewards international_transaction rewards_card transaction_fee 1 0 0 0 0 2 1 1 1 1 3 1 0 1 0 4 0 1 0 1 5 0 0 0 0 6 1 1 1 0