用R提取ngrams
我正试图使用用R提取ngrams,r,text-mining,R,Text Mining,我正试图使用ngramrr包从涅磐文本中提取3克 require(ngramrr) require(tm) require(magrittr) nirvana <- c("hello hello hello how low", "hello hello hello how low", "hello hello hello how low", "hello hello hello", "with the lights out", "it'
ngramrr
包从涅磐文本中提取3克
require(ngramrr)
require(tm)
require(magrittr)
nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now",
"entertain us", "i feel stupid", "and contagious", "here we are now",
"entertain us", "a mulatto", "an albino", "a mosquito", "my libido",
"yeah", "hey yay")
ngramrr(nirvana[1], ngmax = 3)
Corpus(VectorSource(nirvana))
我想知道我能做些什么来构建TermDocumentMatrix
其中术语是trig
列表
谢谢你我上面的评论几乎是完整的,但它是这样的:
nirvana %>% tokens(ngrams = 1:3) %>% # generate tokens
dfm %>% # generate dfm
convert(to = "tm") %>% # convert to tm's document-term-matrix
t # transpose it to term-document-matrix
我会使用quanteda
并转换成tm
格式nirvana%>%代币(ngrams=1:3)%%>%dfm%>%convert(to=“tm”)
@amatsuo\u net谢谢,你能帮我举一个R示例吗?@Cath谢谢;)
nirvana %>% tokens(ngrams = 1:3) %>% # generate tokens
dfm %>% # generate dfm
convert(to = "tm") %>% # convert to tm's document-term-matrix
t # transpose it to term-document-matrix