R 光栅多项式gbm的预测

R 光栅多项式gbm的预测,r,multinomial,r-raster,gbm,R,Multinomial,R Raster,Gbm,目前,似乎无法预测出一个多项式gbm模型。但是,请注意,对于相对较小的光栅栅格,有一种简单的方法可以解决此问题,如下所述。但这里的过程非常缓慢,而且在处理大型光栅、许多类(在我的例子中是植被群落)和预测变量时,也会遇到挑战。我希望下面的信息对任何遇到同样挑战的人都有用 下面,我尝试使用多项式gbm模型和20个预测变量预测36个植被群落的发生概率。我的研究区域是一个30x30m的光栅网格,有21300万像素,但是下面的代码与我用来开发/测试该过程的1221个单元格的小网格有关 > requi

目前,似乎无法预测出一个多项式
gbm
模型。但是,请注意,对于相对较小的光栅栅格,有一种简单的方法可以解决此问题,如下所述。但这里的过程非常缓慢,而且在处理大型光栅、许多类(在我的例子中是植被群落)和预测变量时,也会遇到挑战。我希望下面的信息对任何遇到同样挑战的人都有用

下面,我尝试使用多项式gbm模型和20个预测变量预测36个植被群落的发生概率。我的研究区域是一个30x30m的光栅网格,有21300万像素,但是下面的代码与我用来开发/测试该过程的1221个单元格的小网格有关

> require (gbm)
> require (raster)
> require (rgdal)

> load("gbmmodel_p20.Rda") 

> print(gbmmodel)

gbm(formula = as.formula(Nclustal_1 ~ tcd_coast_disa_f + tce_raddq_f + 
tce_radwq_f + tct_temp_minwin_f + tct_tempdq_f + tcw_clim_etaaann_f + 
tcw_precipseas_f + tcw_precipwq_f + tcw_rain1mm_f + tdd_strmdstge6_i + 
tlf_logre10_f + tlf_rough0500_f + trs_land_pfc_2008 + trs88_sspr_g_50p + 
trs88_ssum_b_50p + trs88_ssum_d_50p + tsp_bd200_f + tsp_cly200a_f + 
tsp_ph200_f + tsp_tn060a_f), distribution = "multinomial", 
data = gbmdata, n.trees = 2500, interaction.depth = 2, n.minobsinnode = 3, 
shrinkage = 0.003, bag.fraction = 0.75, train.fraction = 1, 
cv.folds = 8, keep.data = TRUE, verbose = TRUE, class.stratify.cv = TRUE, 
n.cores = 8)

A gradient boosted model with multinomial loss function.2500 iterations were performed.
The best cross-validation iteration was 2500.
There were 20 predictors of which 20 had non-zero influence.
我将预测器变量堆叠到光栅堆栈中,如下所示:

> img.files <- list.files("/mnt/scratch/mcilwea/R/TSG/inmodel20_test",
pattern='\\.img$', full.names=TRUE)
> rasStack <- stack(img.files)
> NAvalue(rasStack) <- -9999
> projection(rasStack)
"+proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs"
在运行predict.gbm之前,我调用了最佳迭代模型

> best.iter <- gbm.perf(gbmmodel, method = "cv", plot.it = TRUE)
输出为光栅栅格,表示我要预测的第一个植被群落:

|=========================================================| 100%

class       : RasterLayer
dimensions  : 33, 37, 1221  (nrow, ncol, ncell)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : 149.1268, 149.1371, -35.65473, -35.64556  (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : /mnt/scratch/mcilwea/R/TSG/multiclass_BRT_20p_test_idrisi.rdc
names       : layer
values      : 3.762369e-06, 0.9337785  (min, max)
IDRISI文件格式不支持多波段图像,因此我无法将index=1:36添加到混合中以生成多波段光栅砖作为输出。如果我尝试这样做-设置format=“GTiff”或“HFA”(或任何其他需要rgdal的格式),我会收到错误消息:

rgdal::putRasterData中的错误(x@file@瞬态,v,频带=1,偏移量=off):光栅IO期间发生故障“

但是,如果设置format=“raster”,我可以获得rasterbrick输出,但这不允许我读取/写入除idrisi图像(predict.gbm模型的第一个输出列)中的数据以外的任何数据


“警告消息: 在.rasterFromRasterFile(grdfile,band=band,objecttype,…)中: 值文件的大小与单元格数不匹配(给定数据类型)“

这些图像都没有任何意义

还有一点令人费解的是,如果我尝试以多波段CDF图像的形式写入,我会收到一组不同的rgdal错误警告消息:

    |   0%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |========                                                       |  25%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |==================                                              |  50%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |===============================================                   |  75%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 7992 values, but the passed data array only has 7955 entries!
  |=============================================================| 100%
在这里,我不知道发生了什么

如果有人知道如何与gbm包的作者合作,使其能够直接预测到rasterbrick,而不会遇到上述任何问题,那将是非常棒的

如果有人想知道我在全光栅上使用的代码,请在下面留下评论,我很乐意提供

干杯 艾伦

predtable <- as.data.frame(predict.gbm(gbmmodel, outTable, n.trees=best.iter, type="response"))
predout <- cbind(coords,predtable)
predout[1:1,1:38]

             x        y    e24.2500     e26.2500    e59.2500 g152.2500    g157.2500     g94.2500   m31.2500
    149.1269 -35.6457 0.001286283 0.0006473167 0.002043077 0.4973372 8.686316e-05 0.0006710651 0.01067058
         m36.2500    m68.2500    MU11.2500    MU45.2500 OTHER.2500  p14.2500     p15.2500     p17.2500
    0.004314056 0.007128109 0.0005012718 0.0006254022  0.1727706 0.1411112 0.0009099294 0.0002520156
         p19.2500    p20.2500     p22.2500   p220.2500    p23.2500   p24.2500    p27.2500   p338.2500
    0.003205936 0.002534798 0.0001474091 0.001214219 0.008455798 0.01701965 0.001879607 0.002238932
        p420.2500  p520.2500     p54.2500    p9.2500    u118.2500  u179.2500  u21.2500    u22.2500
   0.001456685 0.00108458 0.0003695966 0.02501649 0.0005977814 0.01711885 0.0558054 0.002357498
        u23.2500    u27.2500     u28.2500   u78.2500   Unit5.2500
   0.00040357 0.001422519 0.0002764237 0.01699094 4.835942e-05

    write.csv(predout, "Predout.csv", row.names=TRUE)
names <- names(predtable)
for (i in 1:length(names)) { 
  SpatialPointspredTable <- SpatialPointsDataFrame (coords=coords, data=predtable[i])
  gridded(SpatialPointspredTable)=TRUE
  rasValues <- raster(SpatialPointspredTable)
  projection(rasValues) <- "+proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs"
  plot(rasValues)
  writeRaster(rasValues, filename=names[i], format="HFA", overwrite=TRUE)
}
predict(rasStack,
         gbmmodel,
         n.trees=best.iter,
         filename="multiclass_BRT_20p_test_idrisi",
         format="IDRISI",
         na.rm=FALSE,
         type="response",
         overwrite=TRUE,
         progress="text",
         cores=8)
|=========================================================| 100%

class       : RasterLayer
dimensions  : 33, 37, 1221  (nrow, ncol, ncell)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : 149.1268, 149.1371, -35.65473, -35.64556  (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : /mnt/scratch/mcilwea/R/TSG/multiclass_BRT_20p_test_idrisi.rdc
names       : layer
values      : 3.762369e-06, 0.9337785  (min, max)
predrast <- predict(object=rasStack,
        model=gbmmodel,
        n.trees=best.iter,
        filename="multi_test",
        fun=predict.gbm,
        format="raster",
        index=1:5,
        bandorder="BIL",
        ext=extent(rasStack[[1:20]]), 
        na.rm=FALSE,
        type="response",
        datatype="FLT4S",
        overwrite=TRUE,
        progress="text",
        cores=8) 
|=====================================================================100%

predrast

class       : RasterBrick 
dimensions  : 33, 37, 1221, 5  (nrow, ncol, ncell, nlayers)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : 149.1268, 149.1371, -35.65473, -35.64556  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs 
data source : C:\Data\FINAL_TSG\test\multi_test.grd 
names       :      layer.1,      layer.2,      layer.3,      layer.4,      layer.5 
min values  : 3.762369e-06, 3.762369e-06, 3.762369e-06, 3.762369e-06, 3.762369e-06 
max values  :    0.9337785,    0.9337785,    0.9337785,    0.9337785,    0.9337785 
writeRaster(predrast, filename="multi_test.img", format="HFA", bylayer=TRUE, suffix="numbers", overwrite=TRUE)
    |   0%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |========                                                       |  25%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |==================                                              |  50%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |===============================================                   |  75%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 7992 values, but the passed data array only has 7955 entries!
  |=============================================================| 100%
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ncdf_1.6.8      rgdal_0.9-1     gbm_2.1         lattice_0.20-30 survival_2.37-7 raster_2.3-24   sp_1.0-17      

loaded via a namespace (and not attached):
[1] grid_3.1.2  tools_3.1.2

# Traceback error for
Error in rgdal::putRasterData(x@file@transient, v, band = 1, offset = off) : 
  Failure during raster IO

> traceback()
7: .Call("RGDAL_PutRasterData", raster, rasterData, as.integer(offset), 
       PACKAGE = "rgdal")
6: rgdal::putRasterData(x@file@transient, v, band = 1, offset = off)
5: writeValues(predrast, predv, tr$row[i])
4: writeValues(predrast, predv, tr$row[i])
3: .local(object, ...)
2: predict(object = rasStack, model = gbmmodel, n.trees = best.iter, 
       filename = "multi_img", format = "HFA", na.rm = FALSE, type = "response", 
       datatype = "FLT4S", overwrite = TRUE, progress = "text")
1: predict(object = rasStack, model = gbmmodel, n.trees = best.iter, 
       filename = "multi_img", format = "HFA", na.rm = FALSE, type = "response", 
       datatype = "FLT4S", overwrite = TRUE, progress = "text")