R3.4.1从多个.csv文件读取数据_R_Csv

R3.4.1从多个.csv文件读取数据

r csv

R3.4.1从多个.csv文件读取数据,r,csv,R,Csv,我正在尝试构建一个函数，该函数可以导入/读取.csv文件中的多个数据表，然后计算所选文件的统计信息。每个332.csv文件都包含一个具有相同列名的表：日期、污染物和id。缺少很多值这是我到目前为止编写的函数，用于计算污染物的平均值： pollutantmean <- function(directory, pollutant, id = 1:332) { library(dplyr) setwd(directory) good<-c() for (i in

我正在尝试构建一个函数，该函数可以导入/读取.csv文件中的多个数据表，然后计算所选文件的统计信息。每个332.csv文件都包含一个具有相同列名的表：日期、污染物和id。缺少很多值

这是我到目前为止编写的函数，用于计算污染物的平均值：

pollutantmean <- function(directory, pollutant, id = 1:332) { 

  library(dplyr)
  setwd(directory)
  good<-c()

  for (i in (id)){
    task1<-read.csv(sprintf("%03d.csv",i))
  }

  p<-select(task1, pollutant)
  good<-c(good,complete.cases(p))
  mean(p[good,]) 
}

pollutanmean可能是这样的
library(dplyr)

pollutantmean <- function(directory, pollutant, id = 1:332) { 
    od <- setwd(directory)
    on.exit(setwd(od))

    task_list <- lapply(sprintf("%03d.csv", id), read.csv)
    p_list <- lapply(task_list, function(x) complete.cases(select(x, pollutant)))
    mean(sapply(p_list, mean))
}

库（dplyr）
pollutantmean我的答案提供了一种不用循环就能做你想做的事情（如果我理解正确的话）的方法。我的两个假设是：（1）您有332个具有相同标题（列名）的*.csv文件，因此所有文件都具有相同的结构；（2）您可以将表合并到一个大数据框架中
如果这两个假设是正确的，我将使用文件列表将文件作为数据帧导入（因此此答案不包含循环函数！）
#这将创建一个包含文件名的列表。您必须提供此文件夹的路径。
在每次循环迭代开始时，将要定义的good
文件列表定义为c（）
。要获得您想要的，您应该在循环之外定义good。（当我们正在做的时候，把你的包加载到循环外/把你的工作目录也设置到循环外。）谢谢！我没有想到使用rbind功能谢谢！它的工作原理并不完全相同，因为它仍然计算多个表的多个平均值，而不是全局平均值——但是使用lappy函数很有趣@好的，我会编辑我的代码，让它计算一个全局平均值。谢谢！同时，我将代码更改如下：
# This creates a list with the name of your file. You have to provide the path to this folder.
file_list <- list.files(path = [your path where your *.csv files are saved in], full.names = TRUE)

# This will create a list of data frames.
mylist <- lapply(file_list, read.csv)

# This will 'row-bind' the data frames of the list to one big list.
mydata <- rbindlist(mylist)

# Now you can perform your calculation on this big data frame, using your column information to filter or subset to get information of just a subset of this table (if necessary).