简化for循环:读取多个文件并删除特定数据

简化for循环:读取多个文件并删除特定数据,r,dplyr,data.table,leap-year,R,Dplyr,Data.table,Leap Year,我在一个包含多年降雨数据的文件夹中有38个csv文件,如下所示: Precp_1980.csv Precp_1981.csv Precp_1982.csv Precp_1983.csv . . Precp_2017.csv 闰年文件如下所示: Precp_1980 <- data.frame(matrix(runif(366*11299,min = 0, max = 4), ncol = 366, nrow = 11299)) names(Precp_1980) <- c(rep

我在一个包含多年降雨数据的文件夹中有38个csv文件,如下所示:

Precp_1980.csv

Precp_1981.csv

Precp_1982.csv

Precp_1983.csv . .

Precp_2017.csv

闰年文件如下所示:

Precp_1980 <- data.frame(matrix(runif(366*11299,min = 0, max = 4), ncol = 366, nrow = 11299)) 
names(Precp_1980) <- c(rep(paste0("d_",1:366)))
Precp_1980$ID1 <- seq(1:11299)
Precp_1980$ID2 <- seq(0.0,1.1, length.out = 11299)
Precp_1980$ID3 <- seq(1,10, length.out = 11299)
Precp_1980$ID4 <- seq(100,200, length.out = 11299)
Precp_1980$year<- 1980 
我的目标:

1) 从所有闰年文件中,删除年度doy 60,以便所有文件都有一年365天

2) 将单个文件从宽格式转换为长格式

3) 将所有文件合并为一个文件

我所做的是:

        library(data.table)
        library(dplyr)
        library(reshape2)

        year.list <- list() # create a list to save the outputs 
        yr.list <- c(1980:2017)
        leap.yr <- c(1980,1984,1988,1992,1996,2000,2004,2008,2012,2016) # vector of leap years 

        for(y in seq_along(yr.list)){
            yr <- yr.list[y]

            if(yr %in% leap.yr){ # if a year is a leap year

              dat <- fread("Precp_",yr,".csv"))
              dat.up <- dat %>% dplyr::select(-d_60) # this removes the day 60 from the leap year 
              dat.up.m <- melt(dat.up, id.vars = c("ID1","ID2","ID3","ID4","year"), value.name = "rain", variable.name = "day") # converts the data into long format
              dat.up.m <- dat.up.m %>% mutate(day = gsub("d_", "", day)) %>% # converts the "day_1" to numeric day of year
                                       mutate(day = as.numeric(day)) %>%
                                       mutate(day = ifelse(day >= 61, day - 1, day)) # this converts all days which were greater than 60 to previous day so that I have 365 days of year


            year.list[[y]]  <- dat.up.m
            } else { # if a year is not a leap year 
              dat <- fread("Precp_",yr,".csv"))
              dat.up.m <- melt(dat.up, id.vars = c("ID1","ID2","ID3","ID4","year"), value.name = "rain", variable.name = "day") # converts the data into long format
              dat.up.m <- dat.up.m %>% mutate(day = gsub("d_", "", day)) %>% # converts the "day_1" to numeric day of year
                                        mutate(day = as.numeric(day)) %>%

              year.list[[y]]  <- dat.up.m

          }

          stack.rain <-  rbindlist(year.list)
库(data.table)
图书馆(dplyr)
图书馆(E2)

year.list下面的代码应该有效。我循环浏览这些文件,阅读每个人,但不停地更改
leap.years
。也许有更干净的方法,但这一种比你的简单

year.list <- list() # create a list to save the outputs 
yr.list <- c(1980:2017)
leap.yr <- c(1980,1984,1988,1992,1996,2000,2004,2008,2012,2016) # vector of leap years 

# function to read and clean data sets
data.prep = function(x){
  yy = fread("Precp_", x, ".csv")

  if(x %in% leap.yr){
    yy[, d_60 := NULL]

    cols = as.numeric(gsub("d_", "", names(yy)))
    cols = cols[!is.na(cols)]
    cols[cols > 60] = cols[cols > 60] - 1

    names(yy)[grep("d_", names(yy))] = cols
  } else {
    names(yy)[grep("d_", names(yy))] = paste(1:365)
  }
  return(yy)
}

xx = lapply(yr.list, data.prep))
names(xx) = paste(yr.list)
xx = rbindlist(xx, idcol = "year")

stack_rain <- melt(xx, id.vars = c("ID1","ID2","ID3","ID4","year"),
                 value.name = "rain", variable.name = "day") # converts the data into long format

year.list以下代码应该有效。我循环浏览这些文件,阅读每个人,但不停地更改
leap.years
。也许有更干净的方法,但这一种比你的简单

year.list <- list() # create a list to save the outputs 
yr.list <- c(1980:2017)
leap.yr <- c(1980,1984,1988,1992,1996,2000,2004,2008,2012,2016) # vector of leap years 

# function to read and clean data sets
data.prep = function(x){
  yy = fread("Precp_", x, ".csv")

  if(x %in% leap.yr){
    yy[, d_60 := NULL]

    cols = as.numeric(gsub("d_", "", names(yy)))
    cols = cols[!is.na(cols)]
    cols[cols > 60] = cols[cols > 60] - 1

    names(yy)[grep("d_", names(yy))] = cols
  } else {
    names(yy)[grep("d_", names(yy))] = paste(1:365)
  }
  return(yy)
}

xx = lapply(yr.list, data.prep))
names(xx) = paste(yr.list)
xx = rbindlist(xx, idcol = "year")

stack_rain <- melt(xx, id.vars = c("ID1","ID2","ID3","ID4","year"),
                 value.name = "rain", variable.name = "day") # converts the data into long format

year.list不要在循环中执行任何复杂的数据操作。
在循环中:加载并
融化
所有数据(由于只删除了~1/365的数据,所以不会损失太多内存)。
然后跳出循环:使用
data.table
对象过滤器(删除第60天)并修改数据(“天”列)

#参数

yearAll不要在循环中执行任何复杂的数据操作。
在循环中:加载并
融化
所有数据(由于只删除了~1/365的数据,所以不会损失太多内存)。
然后跳出循环:使用
data.table
对象过滤器(删除第60天)并修改数据(“天”列)

#参数

yearAll这里有一个警告,这里的这个命令
mutate(day=ifelse(day>=61,60,day)
设置所有大于60到60的天数,这些日期不减去1。感谢您指出这一点。我已经纠正了它。这里有一个警告,这里的这个命令
mutate(day=ifelse(day>=61,60,day)
设置所有大于60到60的日期,这些日期不减去1。感谢您指出这一点。我已经更正了它。
# Arguments
yearAll <- 1980:2017
yearLp  <- seq(1980, 2016, 4)
# Libraries
library(data.table)
library(foreach)
# Load data
# It's possible to parallelize loop using %dopar%
result <- foreach(i = yearAll, .combine = rbind) %do% {
    melt(fread(paste0("Precp_", i, ".csv")), 
         c("ID1", "ID2", "ID3", "ID4", "year"))
}
# Modify data
result <- result[!(year %in% yearLp & variable == "d_60")]
result[, day := as.numeric(sub("d_", "", variable))]
result[year %in% yearLp & day >= 61, day := day - 1]