将各种.txt和.html文件连接到R中的一个.txt文件中_R_Concatenation

将各种.txt和.html文件连接到R中的一个.txt文件中

将各种.txt和.html文件连接到R中的一个.txt文件中,r,concatenation,R,Concatenation,我试图编写一个脚本，自动将特定文件夹中的所有文件连接到一个.txt文件中，但我遇到了问题，因为我试图在将其写入txt文件之前将其合并到一个大数据帧中，并且由于列名不匹配而出错，因此我使用了smartbind，但不是，我得到了一个“双行名”的错误这是我的密码： library(gtools) dir<-"/Users/max/Desktop/NISAT_All/Regions" subdir_list<-list.dirs(dir, recursive=F) subdir_li

我试图编写一个脚本，自动将特定文件夹中的所有文件连接到一个.txt文件中，但我遇到了问题，因为我试图在将其写入txt文件之前将其合并到一个大数据帧中，并且由于列名不匹配而出错，因此我使用了smartbind，但不是，我得到了一个“双行名”的错误

这是我的密码：

library(gtools)

dir<-"/Users/max/Desktop/NISAT_All/Regions"

subdir_list<-list.dirs(dir, recursive=F) 
subdir_list<-list.dirs(subdir_list,  recursive=F)
as.matrix(subdir_list)
subdirs_General <- subdir_list[ grepl("General", subdir_list) ]
as.matrix(subdirs_General)
subdir_list <- subdir_list[ !grepl("General", subdir_list) ]
subdir_list<-list.dirs(subdir_list,  recursive=F)
as.matrix(subdir_list)


for (subdir in subdir_list){

  setwd(subdir)

  subdir <-list.files(subdir, recursive=T)
  files <- subdir[ grepl("Armed Groups and Small Guns", subdir) ]
  files <- c(files, subdir[ grepl("Arms Embargoes", subdir) ])
  files <- c(files, subdir[ grepl("Black Market", subdir) ])
  files <- c(files, subdir[ grepl("Brokering", subdir) ])
  files <- c(files, subdir[ grepl("Landmines", subdir) ])
  files <- c(files, subdir[ grepl("MANPADS", subdir) ])
  files <- c(files, subdir[ grepl("Production", subdir) ])
  files <- c(files, subdir[ grepl("Stockpile Security and Destruction", subdir) ])
  files <- c(files, subdir[ grepl("UN Processes", subdir) ])
  files <- c(files, subdir[ grepl("United Nations", subdir) ])
  files <- c(files, subdir[ grepl("Weapons Collection and Amnesties", subdir) ])

  dataframe <- data.frame()

  for (file in files){

      df_temp <- read.delim(file)
      dataframe <- smartbind(dataframe, df_temp, sep="\n")

  }
  #then write your final file
  write.table(dataframe,"MergedFiles.txt",sep="\n", row.names = F, eol = "\r")
  rm(dataframe)

}

库（gtools）
dir假设我有两个文本文件要合并：
test1.txt
I'm not a pheasant plucker, I'm a pheasant plucker's son

and I'm only plucking pheasants til the pheasant plucker comes.

I'm not a pheasant plucker, I'm a pheasant plucker's son
and I'm only plucking pheasants til the pheasant plucker comes.


test2.txt
I'm not a pheasant plucker, I'm a pheasant plucker's son

and I'm only plucking pheasants til the pheasant plucker comes.

I'm not a pheasant plucker, I'm a pheasant plucker's son
and I'm only plucking pheasants til the pheasant plucker comes.


我只需指定要合并的文件的名称，并创建一个包含合并内容的空变量：
files\u to\u combines请不要发布错误图像（以及代码/数据，尽管这不是一个因素）：它无法复制或搜索（SEO），它会破坏屏幕阅读器，并且可能不适合某些移动设备。参考：（和）。请直接包含代码或数据（例如，dput（head（x））
或data.frame（…）
）。（1）迭代地将行添加到数据中。frame
在逻辑上工作，但其性能伸缩性很差：每一行的串联，都会生成一个完整的帧副本，这意味着第一个文件被复制n
次（如果加载了n
文件）。（2） 合并数据时，行名称不能重复，我建议如果行名称有意义，对于每个帧，您应该将它们保存为帧本身的一列，然后使用行名称清除行名称（x）显示的文件是文本文件，使用read.delim
而不是readLines
有什么特殊原因吗？基本上，read.delim
用于读取表格数据，而不是文本文件。使用适当的函数（readLines
，如r2evans所写）。最好将它们全部读入（例如，lappy
）并连接一次（例如，c
），然后迭代构建它。虽然使用简单的列表/向量并没有那么大的问题，但它习惯性地更好，并与不迭代地rbind
ing数据帧的逻辑并行（由于数据的重复复制，其性能非常糟糕）。/例如，我想使用组合感谢@r2evans，你当然是对的。我的回答方法是使用OP熟悉的习惯用法来展示解决方案的方法。如果速度是一个问题，那么思考这种方法为什么慢，以及如何加速，这是一个全新的练习。@r2evans，即，我得到的印象是，以适当的格式读取和写入数据，以及理解数据帧和字符向量之间的差异对于OP来说是非常重要的。在这一阶段，试图了解应用函数可能要求太多？还是我太消极了？艾伦，虽然你的观点并非毫无价值，但学习R的惯用形式也是有意义的。在R中，通常（当然不总是）涉及对向量和列表作为一个整体进行操作。此外，在学习新语言的早期，识别已知的次优方法（迭代连接）是一件好事。因此，虽然“保持简单”无疑是一个好目标，但我也认为鼓励惯用方法是合理的。