R 通过合并csv文件向创建的数据帧添加新列_R_Dataframe_Read.csv

R 通过合并csv文件向创建的数据帧添加新列

r dataframe

R 通过合并csv文件向创建的数据帧添加新列,r,dataframe,read.csv,R,Dataframe,Read.csv,希望有人能帮我解决这个问题。基本上，我有几个csv文件，我想合并它们来创建一个数据帧。每个csv文件都有多行（csv文件的主要部分），然后是一些空行，然后是关于特定文件的一些信息。例如，csv文件1： a b c d 1 2 4 3 4 3 4 2 p 2 t 3 csv文件2： a b c d 0 2 1 8 3 4 1 2 p 4 t 6 我已经能够加入所有csv文件的主要部分。为此，我刚刚创建了一个函数。在这个特定的示例中，我只需要前三行，因此： multmerge=fun

希望有人能帮我解决这个问题。基本上，我有几个csv文件，我想合并它们来创建一个数据帧。每个csv文件都有多行（csv文件的主要部分），然后是一些空行，然后是关于特定文件的一些信息。例如，csv文件1：

csv文件2：

我已经能够加入所有csv文件的主要部分。为此，我刚刚创建了一个函数。在这个特定的示例中，我只需要前三行，因此：

multmerge=function(mypath) {
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist= lapply(filenames, function (x) read.csv(file=x, header=TRUE, 
  nrows=3))
  Reduce(function(x,y) merge(x,y, all = TRUE), datalist)}

  full_data <- multmerge(mypath)

但是，我希望dataframe full_数据包含来自每个csv文件的信息部分的变量，因此最后我会有如下内容：

a b c d p t
1 2 4 3 2 3
4 3 4 2 2 3
0 2 1 8 4 6
3 4 1 2 4 6

有什么提示吗

谢谢

这是我的

数据表

解决方案

为了保持动态，它会将每个文件读取几次。。在处理大文件（或大量文件）时，这可能会大大降低速度。
但据我所知，这是使用

data.table:：fread（）

跳过文件底部的唯一方法
此解决方案的额外好处是，您的文件可以有任意数量的行。代码仅剥离最后四行（两行为空，带p/t值的行）

我的示例数据包含两个文件，

/csv1.csv

和

/csv2.csv

，其中包含问题中的示例数据

下面的de代码中会发生什么：
-创建要读取的文件列表
-使用以下自定义函数中的

data.table:：fread（）

读取文件：
_-确定文件中要读取的行数
_-首先读取每个文件的最后四行以外的所有行
_-然后读取每个文件的最后两行
_-以我们想要的格式组合这两个结果。
-将列表绑定到一个data.table

#get a list of files you want to read
filelist <- list.files( path = "./", pattern = "^csv[12]\\.csv", full.names = TRUE )

#read the files to a list, using a custom function
l <- lapply( filelist, function(x) {
  #get the length of the file first, by reading in the file 
  # sep = "" is for faster reading of the file
  length_file <- nrow( fread(x, sep = "", header = TRUE ) )
  #use data.table::fread to read in the file EXCEPT the four last lines
  file_content <- data.table::fread( x, nrows = length_file - 4 , fill = TRUE )
  #use data.table::fread to read in the file ONLY the last two lines
  file_tail    <- data.table::fread( x, skip = length_file - 2 , fill = TRUE )
  #build final output
  output <- file_content[, `:=`( p = file_tail[ V1 == "p", V2 ],
                                 t = file_tail[ V1 == "t", V2 ] )]
})

# [[1]]
#    a b c d p t
# 1: 1 2 4 3 2 3
# 2: 4 3 4 2 2 3
# 
# [[2]]
#    a b c d p t
# 1: 0 2 1 8 4 6
# 2: 3 4 1 2 4 6

#use data.table::rbindlist() to bind the list to a single data.table
data.table::rbindlist( l )

#    a b c d p t
# 1: 1 2 4 3 2 3
# 2: 4 3 4 2 2 3
# 3: 0 2 1 8 4 6
# 4: 3 4 1 2 4 6

#获取要读取的文件列表
文件列表将主数据与特定文件的信息分开的逻辑是什么。例如a b c d和p T非常感谢！我必须说，信息部分实际上有两个以上的变量。我听从了你的建议。但是，我得到了以下错误：[.data.table
（文件内容，：=
（参与者=文件尾[V1==：在：=
（col1=val1，col2=val2，…）表单中，所有参数都必须命名。没有示例很难说……您可以手动定义x（使用x=filelist[1]
）然后看看length\u file
、file\u content
和file\u tail
是什么样子的……它们是您所期望的吗？如果是，那么看看输出是如何构建的。
#get a list of files you want to read
filelist <- list.files( path = "./", pattern = "^csv[12]\\.csv", full.names = TRUE )

#read the files to a list, using a custom function
l <- lapply( filelist, function(x) {
  #get the length of the file first, by reading in the file 
  # sep = "" is for faster reading of the file
  length_file <- nrow( fread(x, sep = "", header = TRUE ) )
  #use data.table::fread to read in the file EXCEPT the four last lines
  file_content <- data.table::fread( x, nrows = length_file - 4 , fill = TRUE )
  #use data.table::fread to read in the file ONLY the last two lines
  file_tail    <- data.table::fread( x, skip = length_file - 2 , fill = TRUE )
  #build final output
  output <- file_content[, `:=`( p = file_tail[ V1 == "p", V2 ],
                                 t = file_tail[ V1 == "t", V2 ] )]
})

# [[1]]
#    a b c d p t
# 1: 1 2 4 3 2 3
# 2: 4 3 4 2 2 3
# 
# [[2]]
#    a b c d p t
# 1: 0 2 1 8 4 6
# 2: 3 4 1 2 4 6

#use data.table::rbindlist() to bind the list to a single data.table
data.table::rbindlist( l )

#    a b c d p t
# 1: 1 2 4 3 2 3
# 2: 4 3 4 2 2 3
# 3: 0 2 1 8 4 6
# 4: 3 4 1 2 4 6