R'中的预测函数；s MLR产生的结果与预测不一致_R_Machine Learning_Predict_Mlr

R'中的预测函数；s MLR产生的结果与预测不一致

r machine-learning

R'中的预测函数；s MLR产生的结果与预测不一致,r,machine-learning,predict,mlr,R,Machine Learning,Predict,Mlr,我正在使用mlr软件包的框架来建立一个svm模型来预测图像中的土地覆盖类型。我使用了光栅包的预测功能，还将光栅转换为数据帧，然后使用“learner.model”作为输入在该数据帧上进行预测。这些方法给了我现实的结果做好： > predict(raster, mod$learner.model) 或 xy C C2库（mlr） >图书馆（内核实验室） >x1 x2 x3 C d类lrn t res1 res1 预测：50次观测预测类型：prob 阈值：a=0.33，b=0.33，c=

我正在使用mlr软件包的框架来建立一个svm模型来预测图像中的土地覆盖类型。我使用了光栅包的预测功能，还将光栅转换为数据帧，然后使用“learner.model”作为输入在该数据帧上进行预测。这些方法给了我现实的结果

做好：

> predict(raster, mod$learner.model)

或

xy C C2库（mlr） >图书馆（内核实验室） >x1 x2 x3 C d类lrn t res1 res1 预测：50次观测预测类型：prob 阈值：a=0.33，b=0.33，c=0.33 时间：0.01 prob.a prob.b prob.c响应 1 0.2110131 0.3817773 0.4072095摄氏度 2 0.1551583 0.4066868 0.4381549 c 3 0.4305353 0.3092737 0.2601910 a 4 0.2160050 0.4142465 0.3697485 b 5 0.1852491 0.3789849 0.4357659 c 6 0.5879579 0.2269832 0.1850589 a >res2 res2 [1] c c a b c a b a c b b a c b c a b c b c a b b a b a b c c [39]c a b b b b b b b 级别：a、b、c !> res1$data$response==res2 [1] 真假真假真假真假 [13] 真的真的假真的假真的真的 [25]真的真的假的真的真的真的 [37]真的 [49]对错这些预测并不完全相同。在mlr关于预测的教程页面之后，我不明白为什么结果会有所不同。谢谢你的帮助

----- 更新：当我用随机森林模型做同样的事情时，两个向量是相等的。这是因为支持向量机依赖于尺度，而随机森林则不是

 > library(randomForest)

 > classif <- makeClassifTask(id = "example", data = d, target = "C")
 > lrn <- makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = T)
 > t <- train(lrn, classif)
 >
 > res1 <- predict(t, newdata = data.frame(x2,x1,x3))
 > res1
 Prediction: 50 observations
 predict.type: prob
 threshold: a=0.33,b=0.33,c=0.33
 time: 0.00
   prob.a prob.b prob.c response
 1  0.654  0.228  0.118        a
 2  0.742  0.090  0.168        a
 3  0.152  0.094  0.754        c
 4  0.092  0.832  0.076        b
 5  0.748  0.100  0.152        a
 6  0.680  0.098  0.222        a
 >
 > res2 <- predict(t$learner.model, data.frame(x2,x1,x3))
 > res2
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
  a  a  c  b  a  a  a  c  a  b  b  b  b  c  c  a  b  b  a  c  b  a  c  c  b  c
 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
  a  a  b  a  c  c  c  b  c  b  c  a  b  c  c  b  c  b  c  a  c  c  b  b
 Levels: a b c
 >
 > res1$data$response == res2
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [46] TRUE TRUE TRUE TRUE TRUE

>库（随机林）
>等级轻轨
>res1 res1
预测：50次观测
预测类型：prob
阈值：a=0.33，b=0.33，c=0.33
时间：0:00
prob.a prob.b prob.c响应
1 0.654 0.228 0.118 a
20.7420.090 0.168安
30.152 0.094 0.754 c
40.0920.8320.076 b
50.7480.100 0.152安
6 0.680 0.098 0.222 a
>
>res2 res2
1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
a a c b a a c a b b b c a b b a c b c b c c
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
a a b a c c b c b c a b c c b c c b c c b b
级别：a、b、c
>
>res1$data$response==res2
[1] 真实真实真实真实真实真实真实真实真实真实真实真实真实
[16] 真实真实真实真实真实真实真实真实真实真实真实真实真实
[31]真的
[46]真的

---- 另一个更新：如果我将predict.type从“prob”更改为“response”，则两个svm预测向量彼此一致。我将研究这些类型的差异，我认为“prob”给出了相同的结果，但也给出了概率。也许情况并非如此？

答案就在这里：

简而言之，ksvm type=“probabilities”给出的结果与type=“response”不同

如果我跑

 > res2 <- predict(t$learner.model, data.frame(x2,x1,x3), type = "probabilities")
 > res2

>res2 res2

然后我得到与上面res1相同的结果（type=“response”是默认值）

不幸的是，基于概率对图像进行分类似乎不如使用“响应”做得好。也许这仍然是估计分类确定性的最佳方法？

正如您所发现的，“错误”的来源是

mlr

和

kernlab

对预测类型有不同的默认值

mlr

维护相当多的内部“状态”，并检查每个学员的参数以及培训和测试的处理方式。您可以使用

lrn$predict.type

，获得学习者将做出的预测类型，在您的案例中，该类型会给出

“prob”

。如果你想知道所有血淋淋的细节，看看

不建议像您在示例中所做的那样，将

mlr

-包装学习者和“原始”学习者混合使用，并且不必这样做。如果你把它们混合在一起，你发现的事情就会发生——所以当使用

mlr

时，只使用

mlr

结构来训练模型，做出预测，等等

mlr

确实有测试，以确保“原始”学习者和包装学习者产生相同的输出，请参见，例如：

请提供一个示例输入，以便我们可以测试代码以了解可能发生的情况。当然，我会在几分钟内更新。感谢您的回答和对未来的建议，Lars。我之所以混合使用“原始”学习者和包装学习者，是因为我在mlr中调整了模型参数，并希望使用raster:：predict函数，该函数需要mod$learner.model。我可以将光栅转换为数据帧，然后使用mlr的正常预测，但效率不高。啊，我明白了。不过，您应该能够直接使用包装学习器来实现这一点，不是吗？我曾短暂尝试使用包装学习器，但无法在raster的预测函数中直接使用它。错误如下：对“任务”的断言失败：必须有类“任务”，但有类“data.frame”。当我运行“raster:：predict（r，mod）”而不是“raster:：predict（r，mod$learner.model）”时会发生这种情况。其中r是一个光栅对象，mod是经过训练的mlr模型。这不是什么大问题，因为添加“$model.learner”很容易“。但是，对于集成mlr和光栅包，最好知道一种不需要指定的方法。啊，当然，

mlr

的

predict

需要一个

任务

作为第二个参数。关键是predict.R中

predict.WrappedModel

的定义。作为一个快速而肮脏的“修复”，您可以简单地交换该方法签名中的

task

和

newdata

，然后看看它是否适合您。

> library(mlr)
 > library(kernlab)
 > x1 <- rnorm(50)
 > x2 <- rnorm(50, 3)
 > x3 <- rnorm(50, -20, 3)
 > C <- sample(c("a","b","c"), 50, T)
 > d <-  data.frame(x1, x2, x3, C)
 > classif <- makeClassifTask(id = "example", data = d, target = "C")
 > lrn <- makeLearner("classif.ksvm", predict.type = "prob", fix.factors.prediction = T)
 > t <- train(lrn, classif)

 Using automatic sigma estimation (sigest) for RBF or laplace kernel

 > res1 <- predict(t, newdata = data.frame(x2,x1,x3))
 > res1

 Prediction: 50 observations
 predict.type: prob
 threshold: a=0.33,b=0.33,c=0.33
 time: 0.01
      prob.a    prob.b    prob.c response
 1 0.2110131 0.3817773 0.4072095        c
 2 0.1551583 0.4066868 0.4381549        c
 3 0.4305353 0.3092737 0.2601910        a
 4 0.2160050 0.4142465 0.3697485        b
 5 0.1852491 0.3789849 0.4357659        c
 6 0.5879579 0.2269832 0.1850589        a

 > res2 <- predict(t$learner.model, data.frame(x2,x1,x3))
 > res2
  [1] c c a b c a b a c c b c b a c b c a a b c b c c a b b b a a b a c b a c c c
 [39] c a a b c b b b b a b b
 Levels: a b c
!> res1$data$response == res2
  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
 [13]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
 [37]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [49]  TRUE FALSE

 > library(randomForest)

 > classif <- makeClassifTask(id = "example", data = d, target = "C")
 > lrn <- makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = T)
 > t <- train(lrn, classif)
 >
 > res1 <- predict(t, newdata = data.frame(x2,x1,x3))
 > res1
 Prediction: 50 observations
 predict.type: prob
 threshold: a=0.33,b=0.33,c=0.33
 time: 0.00
   prob.a prob.b prob.c response
 1  0.654  0.228  0.118        a
 2  0.742  0.090  0.168        a
 3  0.152  0.094  0.754        c
 4  0.092  0.832  0.076        b
 5  0.748  0.100  0.152        a
 6  0.680  0.098  0.222        a
 >
 > res2 <- predict(t$learner.model, data.frame(x2,x1,x3))
 > res2
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
  a  a  c  b  a  a  a  c  a  b  b  b  b  c  c  a  b  b  a  c  b  a  c  c  b  c
 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
  a  a  b  a  c  c  c  b  c  b  c  a  b  c  c  b  c  b  c  a  c  c  b  b
 Levels: a b c
 >
 > res1$data$response == res2
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [46] TRUE TRUE TRUE TRUE TRUE

 > res2 <- predict(t$learner.model, data.frame(x2,x1,x3), type = "probabilities")
 > res2