R 写入器到NETCDF并行化器_R_Parallel Processing_Maps_Time Series_Raster

R 写入器到NETCDF并行化器

r parallel-processing maps

R 写入器到NETCDF并行化器,r,parallel-processing,maps,time-series,raster,R,Parallel Processing,Maps,Time Series,Raster,我有一个大型的光栅堆栈，包含以下详细信息： class : RasterStack dimensions : 510, 1068, 544680, 19358 (nrow, ncol, ncell, nlayers) resolution : 0.08333333, 0.08333333 (x, y) extent : -141, -52, 41, 83.5 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat

我有一个大型的

光栅堆栈

，包含以下详细信息：

class       : RasterStack
dimensions  : 510, 1068, 544680, 19358  (nrow, ncol, ncell, nlayers)
resolution  : 0.08333333, 0.08333333  (x, y)
extent      : -141, -52, 41, 83.5  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0
names       : Jan.1961.1, Jan.1961.2, Jan.1961.3, Jan.1961.4, Jan.1961.5, Jan.1961.6, Jan.1961.7, Jan.1961.8, Jan.1961.9, Jan.1961.10, Jan.1961.11, Jan.1961.12, Jan.1961.13, Jan.1961.14, Jan.1961.15, ...
time        : 1961-01-01 - 2013-12-31 (range)

做一些类似于：

writeRaster( s,"PP", overwrite=TRUE, format="CDF", varname="P", varunit="mm", 
             longname="totals", xname="lon", yname="lat",zname="time",
             zunit="numeric")

在我的计算机上完成需要2周以上。如何并行运行（可以通过

foreach循环和%dopar%命令

）以在更短的处理时间内获得相同的结果

样本数据

s=brick(nrows=510, ncols=1068, xmn=-180, xmx=180, ymn=-90, ymx=90, crs="+proj=longlat +datum=WGS84", nl=193581)
dates=seq(as.Date("1961-01-01"), as.Date("2013-12-31"), by="day")
s<- setZ(s,dates)

s=brick（nrows=510，ncols=1068，xmn=-180，xmx=180，ymn=-90，ymx=90，crs=“+proj=longlat+datum=WGS84”，nl=193581）
日期=序号（截止日期（“1961-01-01”）、截止日期（“2013-12-31”）、by=“天”）
您可以尝试这段代码，但我并没有在大数据集上真正测试它。我没有测试ncecat的部分。。。我稍后会更新，但您可以同时尝试
wd <- "~/Bureau/Tmp"

# stack with 16 layers
nl <- 16 # 19358
s <- brick(nrows = 510,  ncols = 1068,
           xmn = -180, xmx = 180, ymn = -90, ymx = 90,
           crs = "+proj=longlat +datum=WGS84",
           nl = nl)
dates <- seq(as.Date("1961-01-01"), as.Date("2013-12-31"), by = "day")
s <- setZ(s, dates)

require(foreach)
require(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)

tmp <- foreach(i = 1:nlayers(s)) %dopar% 
{  
  r <- raster::raster(s, i)
  raster::writeRaster(r, 
                      filename = paste0(wd,
                        "/PP_", formatC(i, width = 6, flag = "0")),
                      overwrite=TRUE, format="CDF", varname="P", varunit="mm", 
               longname="totals", xname="lon", yname="lat",zname="time",
               zunit="numeric")
  rm(r)
}
stopCluster(cl)    

ppfiles <- list.files(wd)[grep("PP_", list.files(wd))]
system(paste0("ncecat ppfiles output.nc")

wd您可以尝试这段代码，但我并没有在大型数据集上真正测试它。我没有测试ncecat的部分。。。我稍后会更新，但您可以同时尝试
wd <- "~/Bureau/Tmp"

# stack with 16 layers
nl <- 16 # 19358
s <- brick(nrows = 510,  ncols = 1068,
           xmn = -180, xmx = 180, ymn = -90, ymx = 90,
           crs = "+proj=longlat +datum=WGS84",
           nl = nl)
dates <- seq(as.Date("1961-01-01"), as.Date("2013-12-31"), by = "day")
s <- setZ(s, dates)

require(foreach)
require(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)

tmp <- foreach(i = 1:nlayers(s)) %dopar% 
{  
  r <- raster::raster(s, i)
  raster::writeRaster(r, 
                      filename = paste0(wd,
                        "/PP_", formatC(i, width = 6, flag = "0")),
                      overwrite=TRUE, format="CDF", varname="P", varunit="mm", 
               longname="totals", xname="lon", yname="lat",zname="time",
               zunit="numeric")
  rm(r)
}
stopCluster(cl)    

ppfiles <- list.files(wd)[grep("PP_", list.files(wd))]
system(paste0("ncecat ppfiles output.nc")

wd我对netcdf文件不太熟悉，但运行“示例数据”脚本时，writeRaster（）脚本会出现错误，因为它说所有值都是NA。使用setValues（）将所有值设置为1（例如），也会引发错误：“错误：无法分配大小为78.6 Gb的向量”。对于普通笔记本电脑上这么大的尺寸，我不确定并行化会有多快。@ken感谢您为示例数据添加值。我在运行实际数据时遇到了相同的错误。我认为并行计算可能会有所帮助。您可以将层保存在单独的netcdf文件中，对不同的层并行使用writerater
。然后，如果您在Linux上工作，您可以在终端中使用packagenco
（或者使用带有system
的R和一些paste
作为名称）和功能ncecat
。我不知道这是否会更快，但这是您要求的并行写入的替代方案…@StatnMap如果您告诉我如何取消堆栈并将每个层分别写入netcdf，我将不胜感激。当然，我可以在CDO
或*cat
中使用mergetime
将netcdf放在一起。我对netcdf文件不是非常熟悉，但运行“示例数据”脚本后，writeRaster（）脚本会出现错误，因为它说所有值都是NA。使用setValues（）将所有值设置为1（例如），也会引发错误：“错误：无法分配大小为78.6 Gb的向量”。对于普通笔记本电脑上这么大的尺寸，我不确定并行化会有多快。@ken感谢您为示例数据添加值。我在运行实际数据时遇到了相同的错误。我认为并行计算可能会有所帮助。您可以将层保存在单独的netcdf文件中，对不同的层并行使用writerater
。然后，如果您在Linux上工作，您可以在终端中使用packagenco
（或者使用带有system
的R和一些paste
作为名称）和功能ncecat
。我不知道这是否会更快，但这是您要求的并行写入的替代方案…@StatnMap如果您告诉我如何取消堆栈并将每个层分别写入netcdf，我将不胜感激。当然可以，我可以在CDO
或*cat
中使用mergetime
来组合NetCDF。非常感谢。它对我很有效。可能您可以在代码末尾添加stopCluster（cl）
。非常感谢。它对我很有效。可能您可以在代码末尾包含stopCluster（cl）
。