R 在循环中填充数据帧

R 在循环中填充数据帧,r,csv,dataframe,R,Csv,Dataframe,我在一个目录中有300多个csv文件。 csv文件具有以下结构 id Date Nitrate Sulfate id of csv file Some date Some Value Some Value id of csv file Some date Some Value Some Value id of csv file Some date Some Value Some Value 我想计算每个csv文件中的行数,

我在一个目录中有300多个csv文件。 csv文件具有以下结构

id              Date        Nitrate     Sulfate
id of csv file  Some date   Some Value  Some Value
id of csv file  Some date   Some Value  Some Value
id of csv file  Some date   Some Value  Some Value
我想计算每个csv文件中的行数,不包括该文件中的NA,并将其存储在dataframe中,dataframe有两列:(1)id和(2)nobs

以下是我的代码:

complete <-function(directory,id){
  filenames <-sprintf("%03d.csv", id)
  filenames <-paste(directory,filenames,sep = '/')
  dataframe <-data.frame(id=numeric(0),nobs=numeric(0))
  for(i in filenames){
    data <- read.csv(i)
    dataframe[i,dataframe$id]<-data[data$id]
    dataframe[i,dataframe$nobs]<-nrow(data[!is.na(data$sulfate & data$nitrate),])
  }

  dataframe

}

complete问题出在这两行:

dataframe[i,dataframe$id]<-data[data$id]
dataframe[i,dataframe$nobs]<-nrow(data[!is.na(data$sulfate & data$nitrate),])
使用预期行数而不是
0

所以最简单的方法就是

dataframe <- rbind(dataframe, data.frame(id=data$id[1], nobs=nrow(data[!is.na(data$sulfate) & !is.na(data$nitrate),]))

dataframe问题在于这两行:

dataframe[i,dataframe$id]<-data[data$id]
dataframe[i,dataframe$nobs]<-nrow(data[!is.na(data$sulfate & data$nitrate),])
使用预期行数而不是
0

所以最简单的方法就是

dataframe <- rbind(dataframe, data.frame(id=data$id[1], nobs=nrow(data[!is.na(data$sulfate) & !is.na(data$nitrate),]))

dataframe我通常更喜欢将行添加到预先分配的列表中,然后将它们绑定在一起。下面是一个工作示例:

##### fake read.csv function returning random data.frame 
# (just to reproduce your case, remove this from your code...)
read.csv <- function(fileName){
  stupidHash <- sum(as.integer(charToRaw(fileName)))
  if(stupidHash %% 2 == 0){
    return(data.frame(id=stupidHash,date='2016-02-28',
                      nitrate=c(NA,2,3,NA,5),sulfate=c(10,20,NA,NA,40)))
  }else{
    return(data.frame(id=stupidHash,date='2016-02-28',
                      nitrate=c(4,2,3,NA,5,9),sulfate=c(10,20,NA,NA,40,50)))
  }
}
#####

complete <-function(directory,id){
  filenames <-sprintf("%03d.csv", id)
  filenames <-paste(directory,filenames,sep = '/')
  # here we pre-allocate a list of lenght=length(filenames)
  # where we will put the rows of our future data.frame
  rowsList <- vector(mode='list',length=length(filenames)) 
  for(i in 1:length(filenames)){
    filename <- filenames[i]
    data <- read.csv(filename)
    rowsList[[i]] <- data.frame(id=data$id[1],
                                nobs=sum(!is.na(data$sulfate) & !is.na(data$nitrate)))
  }
  # here we bind all the previously created rows together into one data.frame
  DF <- do.call(rbind.data.frame, rowsList)
  return(DF)
}
res <- complete(directory='dir',id=1:3)

> res
   id nobs
1 889    4
2 890    2
3 891    4
返回random data.frame的伪read.csv函数
#(为了重现您的案例,请将其从代码中删除…)

read.csv我通常更喜欢将行添加到预先分配的列表中,然后将它们绑定在一起。下面是一个工作示例:

##### fake read.csv function returning random data.frame 
# (just to reproduce your case, remove this from your code...)
read.csv <- function(fileName){
  stupidHash <- sum(as.integer(charToRaw(fileName)))
  if(stupidHash %% 2 == 0){
    return(data.frame(id=stupidHash,date='2016-02-28',
                      nitrate=c(NA,2,3,NA,5),sulfate=c(10,20,NA,NA,40)))
  }else{
    return(data.frame(id=stupidHash,date='2016-02-28',
                      nitrate=c(4,2,3,NA,5,9),sulfate=c(10,20,NA,NA,40,50)))
  }
}
#####

complete <-function(directory,id){
  filenames <-sprintf("%03d.csv", id)
  filenames <-paste(directory,filenames,sep = '/')
  # here we pre-allocate a list of lenght=length(filenames)
  # where we will put the rows of our future data.frame
  rowsList <- vector(mode='list',length=length(filenames)) 
  for(i in 1:length(filenames)){
    filename <- filenames[i]
    data <- read.csv(filename)
    rowsList[[i]] <- data.frame(id=data$id[1],
                                nobs=sum(!is.na(data$sulfate) & !is.na(data$nitrate)))
  }
  # here we bind all the previously created rows together into one data.frame
  DF <- do.call(rbind.data.frame, rowsList)
  return(DF)
}
res <- complete(directory='dir',id=1:3)

> res
   id nobs
1 889    4
2 890    2
3 891    4
返回random data.frame的伪read.csv函数
#(为了重现您的案例,请将其从代码中删除…)

read.csv你看过这篇文章了吗?你查过这个帖子了吗@bartoszukum我尝试过你的方法,但最终以这个错误告终。
$@Farrukh Ahmed中出错对不起,我假设用于填充数据帧的值是有效的。请考虑一下。我假设data$id在每个位置都有相同的值,所以我首先使用。要使用is.na,您必须分别检查每一列,并在其后设置逻辑“and”。@bartoszukum我尝试了您的方法,但最终出现了此错误。
$@Farrukh Ahmed中出错对不起,我假设用于填充数据帧的值是有效的。请考虑一下。我假设data$id在每个位置都有相同的值,所以我首先使用。要使用is.na,您必须分别检查每一列,并在其后设置逻辑“and”。