R：将列表与csv元数据相结合_R_Csv_Metadata

R：将列表与csv元数据相结合

r csv

R：将列表与csv元数据相结合,r,csv,metadata,R,Csv,Metadata,我在处理文本文件和文件的相关元数据时遇到了一些问题。我可以读入文件，对它们进行预处理，然后将它们转换为我正在使用的lda包（）的可读格式。示例如下： #Reading the files corpus <- file.path("Folder/Fiction/texts") corpus <- list.files(corpus) corpus <- lapply(corpus, readLines) ***pre-processing functions removed f

我在处理文本文件和文件的相关元数据时遇到了一些问题。我可以读入文件，对它们进行预处理，然后将它们转换为我正在使用的lda包（）的可读格式。示例如下：

#Reading the files
corpus <- file.path("Folder/Fiction/texts")
corpus <- list.files(corpus)
corpus <- lapply(corpus, readLines)

***pre-processing functions removed for space***

corp.list <- strsplit(corpus, "[[:space:]]+")

# compute the table of terms:
corpterm.table <- table(unlist(corp.list))
corpterm.table <- sort(corpterm.table, decreasing = TRUE)

***removing stopwords, again removed for space***

# now put the corpus into the format required by the lda package:
getCorp.terms <- function(x) {
  index <- match(x, vocabCorp)
  index <- index[!is.na(index)]
  rbind(as.integer(index - 1), as.integer(rep(1, length(index))))
  }
  corpus <- lapply(corp.list, getCorp.terms)

然后，组合使用merge或match函数将每个文档（或文档标记向量）与其正确的元数据行相关联

尝试更改为：

pth <- file.path("Folder/Fiction/texts")
fi <- list.files(pth)
corpus <- lapply(fi, readLines)
corp.list <- strsplit(corpus, "[[:space:]]+") 
setNames(object = corp.list, nm = fi) -> corp.list

pth在这一步之后，您的数据的结构是什么：corpus@Chris结构是一个大列表，如下所示：1128$：chr[1:61616]“word”“word”“word”…$：chr[1:108093]，，，，，，，，，，，，，，，，，，，$：chr[1:29334]，，，，，，，，，，，等等，每个文档有一个向量。这样可以将文件名附加到列表中的每个文本中。我的下一个任务是找出如何将文本本身作为单个列添加到csv中。
pth <- file.path("Folder/Fiction/texts")
fi <- list.files(pth)
corpus <- lapply(fi, readLines)
corp.list <- strsplit(corpus, "[[:space:]]+") 
setNames(object = corp.list, nm = fi) -> corp.list