Python 2.7 比较KERA的指标与sklearn.classification\u报告的指标
在评估神经网络时,我正在努力使用不同的指标。 我的调查显示,与sklearn.classification报告相比,Keras(版本1.2.2)计算特定指标的不同值(使用函数评估) 具体而言,度量“精度”(即Keras的“精度”=“精度”sklearn)或“召回”(即Keras的“召回”=“召回”sklearn)的值不同。 对于以下工作示例,差异似乎是随机的,但评估更大的网络表明,Keras的“精度”等于(几乎等于)sklearn的“召回率”,而这两个“召回率”指标明显不同 我感谢你的帮助Python 2.7 比较KERA的指标与sklearn.classification\u报告的指标,python-2.7,scikit-learn,keras,metrics,difference,Python 2.7,Scikit Learn,Keras,Metrics,Difference,在评估神经网络时,我正在努力使用不同的指标。 我的调查显示,与sklearn.classification报告相比,Keras(版本1.2.2)计算特定指标的不同值(使用函数评估) 具体而言,度量“精度”(即Keras的“精度”=“精度”sklearn)或“召回”(即Keras的“召回”=“召回”sklearn)的值不同。 对于以下工作示例,差异似乎是随机的,但评估更大的网络表明,Keras的“精度”等于(几乎等于)sklearn的“召回率”,而这两个“召回率”指标明显不同 我感谢你的帮助 fr
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils # numpy utils for to_categorical()
from keras import backend as K # abstract backend API (in order to generate compatible code for Theano and Tf)
from sklearn.metrics import classification_report
batch_size = 128
nb_classes = 10
nb_epoch = 30
# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
if K.image_dim_ordering() == 'th':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # range [0,1]
X_test /= 255 # range [0,1]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes) # necessary for use of categorical_crossentropy
Y_test = np_utils.to_categorical(y_test, nb_classes) # necessary for use of categorical_crossentropy
# create model
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
# configure model
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy', 'precision', 'recall'])
# train model
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
# evaluate model with keras
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
print('Test precision:', score[2])
print('Test recall:', score[3])
# evaluate model with sklearn
predictions_last_epoch = model.predict(X_test, batch_size=batch_size, verbose=1)
target_names = ['class 0', 'class 1', 'class 2', 'class 3', 'class 4',
'class 5', 'class 6', 'class 7', 'class 8', 'class 9']
predicted_classes = np.argmax(predictions_last_epoch, axis=1)
print('\n')
print(classification_report(y_test, predicted_classes,
target_names=target_names, digits = 6))
E D I T
上面给出的脚本输出:
Test score: 0.0271549037314
Test accuracy: 0.9916
Test precision: 0.992290322304
Test recall: 0.9908
9728/10000 [============================>.] - ETA: 0s
precision recall f1-score support
class 0 0.987867 0.996939 0.992382 980
class 1 0.993860 0.998238 0.996044 1135
class 2 0.990329 0.992248 0.991288 1032
class 3 0.991115 0.994059 0.992585 1010
class 4 0.994882 0.989817 0.992343 982
class 5 0.991041 0.992152 0.991597 892
class 6 0.993678 0.984342 0.988988 958
class 7 0.992180 0.987354 0.989761 1028
class 8 0.989754 0.991786 0.990769 974
class 9 0.991054 0.988107 0.989578 1009
avg / total 0.991607 0.991600 0.991597 10000
对于另一个型号:
val/test loss: 0.231304548573
val/test categorical_accuracy: **0.978500002956**
val/test precision: *0.995103668976*
val/test recall: 0.941900001907
val/test fbeta_score: 0.967675107574
val/test mean_squared_error: 0.0064611148566
10000/10000 [==============================] - 0s
precision recall f1-score support
class 0 0.989605 0.971429 0.980433 980
class 1 0.985153 0.993833 0.989474 1135
class 2 0.988154 0.969961 0.978973 1032
class 3 0.981373 0.991089 0.986207 1010
class 4 0.968907 0.983707 0.976251 982
class 5 0.997633 0.945067 0.970639 892
class 6 0.995690 0.964509 0.979852 958
class 7 0.987230 0.977626 0.982405 1028
class 8 0.945205 0.991786 0.967936 974
class 9 0.951429 0.990089 0.970374 1009
avg / total *0.978964* **0.978500** 0.978522 10000
所需指标的定义(针对model.compile):
model.metrics_名称的输出:
['loss', 'categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']
是的,这是不同的,因为sklearn分类报告根据支持度为您提供加权平均值 试验:
from sklearn.metrics import classification_report
y_true = [0, 1,2,1]
y_pred = [0, 0,2,0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
给你:
精确回忆f1分数支持
class 0 0.33 1.00 0.50 1
class 1 0.00 0.00 0.00 2
class 2 1.00 1.00 1.00 1
avg / total 0.33 0.50 0.38 **4**
然而,(1+0+0.33)/3=0.44(3),但从支持栏sklearn returns(1*1+0*2+0.33*1)/4=0.3325,你能分享评估结果吗?当然,我已经编辑了我的初始帖子。@D.Laupheimer,你能确认它只为最后一批提供了分类报告吗?非常感谢你的回答。现在我了解了sklearn分类报告如何计算值。Keras的指标不使用加权平均值——只使用算术平均值,对吗?但我仍然不明白为什么回忆(sklearn metric)等于精度(keras metric)。。。这对我来说还是有点奇怪。哈哈,这是我也思考了很多的事情:D,但很可能这只是一个巧合。如果观察精度和召回率的sklearn度量,它们实际上相差0.000007:D,并且它们与精度值相似。由keras提供。然而,这并不意味着存在实现错误,只意味着您正在解决的问题会导致有趣的不相关观察结果。我不相信巧合D我得到了相同的结果(回忆[sklearn metric]等于不同实验的精度[keras metric],不同实验的年代和模型不同(请看初级帖子的“编辑”部分)这不可能是巧合!这种行为一定有确定的原因。:D好的,你有我的好奇心!你能运行model.metrics\u names吗?它会给你分数值的名称。我的假设是分数[1]实际上是recallWell,起初我也这么认为。出于这个原因,我不仅检查了所需度量的定义(对于model.compile()),而且还执行了model.metrics\u name(如果在培训方法中出现不可理解的顺序混乱-情况绝对不是这样:请查看初始帖子,了解所需度量的定义和model.metrics\u名称的输出)。
class 0 0.33 1.00 0.50 1
class 1 0.00 0.00 0.00 2
class 2 1.00 1.00 1.00 1
avg / total 0.33 0.50 0.38 **4**