R 使用tm_联合收割机的困难_R

R 使用tm_联合收割机的困难

R 使用tm_联合收割机的困难,r,R,我无法在R中使用tm_combine。以下是版本详细信息 platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status m

我无法在R中使用tm_combine。以下是版本详细信息

platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          3.3                         
year           2017                        
month          03                          
day            06                          
svn rev        72310                       
language       R                           
version.string R version 3.3.3 (2017-03-06)
nickname       Another Canoe

我想更多地了解这一点。如果访问此文件时出现问题，我的问题是如何组合具有不同列数的两个文档术语矩阵D1和D2

> packageVersion("tm")
[1] ‘0.7.1’
> dim(s.tdm)
[1] 132 536
> dim(f.tdm)
[1] 132 674
>

谢谢

这是我尝试的代码

library(tm)
library(SnowballC)

s.dir <- "AuthorIdentify\\Author1"
f.dir <- "AuthorIdentify\\Author2"

s.docs <- Corpus(DirSource(s.dir, encoding="UTF-8"))
f.docs <- Corpus(DirSource(f.dir, encoding="UTF-8"))

cleanCorpus<-function(corpus){
  # apply stemming
  corpus <-tm_map(corpus, stemDocument)

  # remove punctuation
  corpus.tmp <- tm_map(corpus,removePunctuation)

  # remove white spaces
  corpus.tmp <- tm_map(corpus.tmp,stripWhitespace)

  # remove stop words
  corpus.tmp <-
    tm_map(corpus.tmp,removeWords,stopwords("en"))

  return(corpus.tmp)
}

s.cldocs <- cleanCorpus(s.docs) # preprocessing

# forms document-term matrix
s.tdm <- DocumentTermMatrix(s.cldocs)

# removes infrequent terms
s.tdm <- removeSparseTerms(s.tdm,0.97)

dim(s.tdm) # [ #docs, #numterms ]

f.cldocs <- cleanCorpus(f.docs) # preprocessing

# forms document-term matrix
f.tdm <- DocumentTermMatrix(f.cldocs)

# removes infrequent terms
f.tdm <- removeSparseTerms(f.tdm,0.97)

dim(f.tdm) # [ #docs, #numterms ]


#how do I combine f.tdm and s.tdm
tm_combine???

library（tm）
图书馆（SnowballC）
s、 dir您确实需要提供一个最小的可复制示例（请参阅本文的帮助：）而不需要特定的错误消息，甚至一些我们无法帮助您的数据。当我在RStudio命令区域中说？tm_combine时，我在下拉列表中没有得到此函数名。我的第一个问题实际上是关于我可能缺少的任何东西，因为我有正确版本的RStudio和tm软件包添加了我的代码，以防它有助于获得更好的上下文。我刚刚检查了最新版本的tm
，没有tm\u combine（）
函数。这个答案可能会有帮助：可能是重复的