Keras GridsearchCV损失不为';t等于model.fit()损失值

Keras GridsearchCV损失不为';t等于model.fit()损失值,keras,gridsearchcv,Keras,Gridsearchcv,我不清楚GridsearchCV在其参数搜索中使用了哪种度量。我的理解是,我的模型对象为它提供了一个度量,这就是用来确定“最佳参数”的度量。但事实似乎并非如此。我认为score=None是默认值,因此使用了model.compile()的metrics选项中给出的第一个度量。所以在我的例子中,使用的评分函数应该是平均误差。下面将介绍我对这个问题的解释 这就是我正在做的。我用sklearn模拟了一些回归数据,10万次观测中有10个特征。我在玩keras,因为我过去通常使用pytorch,直到现在才

我不清楚GridsearchCV在其参数搜索中使用了哪种度量。我的理解是,我的模型对象为它提供了一个度量,这就是用来确定“最佳参数”的度量。但事实似乎并非如此。我认为score=None是默认值,因此使用了model.compile()的metrics选项中给出的第一个度量。所以在我的例子中,使用的评分函数应该是平均误差。下面将介绍我对这个问题的解释

这就是我正在做的。我用sklearn模拟了一些回归数据,10万次观测中有10个特征。我在玩keras,因为我过去通常使用pytorch,直到现在才真正涉足keras。在获得最佳参数集后,我注意到GridsearchCV调用的loss函数输出与model.fit()调用的loss函数输出存在差异。现在我知道我可以重新安装=True,而不必再次重新拟合模型,但我正在尝试了解keras和sklearn GridsearchCV函数的输出

我现在看到的是,要明确地说明这一差异。我使用sklearn模拟了一些数据,如下所示:

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]
我已经创建了一个“create_model”函数,它希望调整我使用的激活函数(这也是一个简单的概念验证示例)

执行网格搜索时,我得到以下输出

model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best: -21.163454 using {'activation_fn': 'linear'}
好的,所以最好的度量是21.16的均方误差(我知道他们翻转符号来产生最大化问题)。因此,当我使用激活函数拟合模型时,得到的均方误差是完全不同的

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
.....
.....
Epoch 49/50
8000/8000 [==============================] - 0s 48us/step - loss: 344.1636 - mean_squared_error: 344.1636 - mean_absolute_error: 12.2109
Epoch 50/50
8000/8000 [==============================] - 0s 48us/step - loss: 326.4524 - mean_squared_error: 326.4524 - mean_absolute_error: 11.9250
history.history['mean_squared_error']
Out[723]: 
[10053.778002929688,
 9826.66806640625,
  ......
  ......
 344.16363830566405,
 326.45237121582034]
差别是326.45对21.16。如果您能了解我的误解,我们将不胜感激。如果他们彼此在一个合理的邻域内,我会更舒服,因为一个是一倍于整个训练数据集的误差。但21远不及326。谢谢

这里可以看到整个代码

import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing 
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)

history.history.keys()
plt.plot(history.history['mean_absolute_error'])

# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

输出(326.45237121582034)中报告的较大损失是培训损失。如果您需要一个指标与
网格结果、最佳得分
(在
网格搜索CV
中)和MSE(在
最佳模型.fit
中)进行比较,您必须请求验证损失(参见下面的代码)

现在问一个问题:为什么验证损失比培训损失低?在您的案例中,这主要是因为辍学(这是在培训期间应用的,但不是在验证/测试期间应用的)-这就是为什么当您删除辍学时,培训和验证损失之间的差异消失了。您可以找到验证丢失率较低的可能原因的详细解释

简而言之,模型的性能(MSE)由
grid\u result.best\u score
给出(示例中为21.163454)


大卫,谢谢你花时间阅读我的问题并回答。如果这是一个愚蠢的问题,请原谅,但你是说网格结果。最佳分数是在没有退出层的情况下计算的吗?我想规范化模型,所以我不想保留它吗?或者你是说,在我定义了最佳参数之后,我应该重新拟合模型,而不会中途退出?谢谢。当您的模型过盈时,需要调整。正如您在添加的图中所看到的,这里不是这种情况,因此您不需要正则化。相反,您的示例显示,保留退出会恶化模型性能。因此,是的,模型可以运行(以及计算的
网格结果。最佳得分
),而不需要退出层-另请参阅我提供的链接中的解释。嗨,David。我为这种含糊不清表示歉意。我对目前最好的型号不感兴趣。我这样做只是为了练习,以便了解GridsearchCV的输出以及它与best_model.fit()的输出的关系。在网格搜索中,我看到的最佳分数约为21,而运行best_model.fit()会导致MSE约为300。我试图理解为什么这两种方法在数量级上存在如此巨大的差异(无论我是否使用辍学等)。他们应该比现在更接近。这有意义吗?是的。我相应地修改了我的答案。还有几个提示:1)你还需要使用tensorflow种子(
tf.random.set_seed(42)
)来获得一个完全确定的模型;2) 创建培训/测试集更容易使用。
import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing 
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)

history.history.keys()
plt.plot(history.history['mean_absolute_error'])

# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.datasets import make_regression
import tensorflow as tf

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
tf.random.set_seed(42)

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1, validation_data=(X_test, y_test))

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1, validation_data=(X_test, y_test))

history.history.keys()
# plt.plot(history.history['mae'])

# summarize results
print(grid_result.cv_results_)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))