R 为什么e1071为这种朴素的贝叶斯分类器预测提供NAs?
失败R 为什么e1071为这种朴素的贝叶斯分类器预测提供NAs?,r,R,失败 library(e1071) train.x <- data.frame( B=c(0,1,0), C=c(0,0,0), D=c(0,0,1), Z=c(1,0,0) ) classifier <- naiveBayes(x=train.x, y=factor(c(TRUE, TRUE, FALSE)), laplace=1) # use laplace (i.e. alpha) of nearly 0 predict(classifier, train
library(e1071)
train.x <- data.frame(
B=c(0,1,0),
C=c(0,0,0),
D=c(0,0,1),
Z=c(1,0,0)
)
classifier <- naiveBayes(x=train.x, y=factor(c(TRUE, TRUE, FALSE)), laplace=1) # use laplace (i.e. alpha) of nearly 0
predict(classifier, train.x, type="raw")
FALSE TRUE
[1,] NA NA
[2,] NA NA
[3,] NA NA
train.x <- data.frame(
B=c(0,1,0,1),
C=c(0,0,0,1),
D=c(0,0,1,1),
Z=c(1,0,0,1)
)
classifier <- naiveBayes(x=train.x, y=factor(c(TRUE, TRUE, FALSE, FALSE)), laplace=1) # use laplace (i.e. alpha) of nearly 0
predict(classifier, train.x, type="raw")
FALSE TRUE
[1,] 0.000000002761 0.999999997239
[2,] 0.000000002761 0.999999997239
[3,] 0.997729292055 0.002270707945
[4,] 0.999999994295 0.000000005705
库(e1071)
对于数值变量,naiveBayes
使用每个变量的平均值和标准偏差来计算每个类别每个变量的概率。因为您只有三个培训示例,所以至少一个类的标准偏差必须是未定义的(您提供了两个培训示例的类可以)。通过查看分类器的表
属性可以看到,该属性显示平均值和标准偏差:
> classifier$tables
$B
B
factor(c(TRUE, TRUE, FALSE)) [,1] [,2]
FALSE 0.0 NA
TRUE 0.5 0.7071068
$C
C
factor(c(TRUE, TRUE, FALSE)) [,1] [,2]
FALSE 0 NA
TRUE 0 0
$D
D
factor(c(TRUE, TRUE, FALSE)) [,1] [,2]
FALSE 1 NA
TRUE 0 0
$Z
Z
factor(c(TRUE, TRUE, FALSE)) [,1] [,2]
FALSE 0.0 NA
TRUE 0.5 0.7071068
naiveBayes
区分数值变量和分类变量,分类变量的概率在没有标准偏差的情况下工作。因此,如果您将数据转换为逻辑数据,它将起作用:
train.x <- sapply(train.x, as.logical)
classifier <- naiveBayes(x=train.x, y=factor(c(TRUE, TRUE, FALSE)), laplace=1)
predict(classifier, train.x, type="raw")
FALSE TRUE
[1,] 0.4705882 0.52941176
[2,] 0.4705882 0.52941176
[3,] 0.9142857 0.08571429
train.x我的猜测是:在第一种情况下,可能与#自变量>#训练示例有关。