Tensorflow 尝试在每次迭代失败后保存Keras模型

Tensorflow 尝试在每次迭代失败后保存Keras模型,tensorflow,machine-learning,keras,neural-network,Tensorflow,Machine Learning,Keras,Neural Network,我没有使用sklearn的RandomSearch(我必须等待整个过程完成才能看到最佳结果),而是尝试提前创建所有超参数组合,并逐个运行,这样我就可以随时停止,在服务器无事可做时继续 这是我的代码: 首先,这是创建模型的函数: def create_model(neurons=2000, activation1='tanh', dropout_rate=0.0, activation2='sigmoid'): model = Sequential() model.add(Dens

我没有使用sklearn的RandomSearch(我必须等待整个过程完成才能看到最佳结果),而是尝试提前创建所有超参数组合,并逐个运行,这样我就可以随时停止,在服务器无事可做时继续

这是我的代码:

首先,这是创建模型的函数:

def create_model(neurons=2000, activation1='tanh', dropout_rate=0.0, activation2='sigmoid'):
    model = Sequential()
    model.add(Dense(neurons, input_dim=10000, activation=activation1))
    model.add(Dropout(dropout_rate))
    # model.add(Dense(390, activation='relu'))
    model.add(Dense(61, activation=activation2))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model


model = KerasRegressor(build_fn=create_model, verbose=4)
其次,我正在创建所有超参数组合

它们将存储在“params”变量中:

batch_size = np.arange(1, 400)  
nb_epoch = np.arange(100, 400)
activation1 = ['relu', 'tanh', 'sigmoid']  
activation2 = ['sigmoid']
dropout_rate = np.arange(0, 0.2, 0.01)
neurons = np.arange(250, 5000)  
param_distributions = dict(batch_size=batch_size, nb_epoch=nb_epoch, activation1=activation1, \
                           activation2=activation2, dropout_rate=dropout_rate, neurons=neurons)
grid = RandomizedSearchCV(estimator=model, param_distributions=param_distributions, n_iter=1000, n_jobs=-1,
                          random_state=42)


x = grid._get_param_iterator()
params = list(x)
最后,我在运行模型

每次使用params中的当前下一个hyperparameters配置

for p in params:
   model = KerasRegressor(build_fn=create_model, verbose=4)
   batch_size = [p['batch_size']] 
   nb_epoch = [p['nb_epoch']]
   activation1 = [p['activation1']]
   activation2 = [p['activation2']]
   dropout_rate = [p['dropout_rate']]
   neurons = [p['neurons']]
   curr_param_distributions = dict(batch_size=batch_size, nb_epoch=nb_epoch, activation1=activation1, \
                                activation2=activation2, dropout_rate=dropout_rate, neurons=neurons)
   curr_grid = RandomizedSearchCV(estimator=model, param_distributions=curr_param_distributions, n_iter=1, n_jobs=-1)
   curr_grid_result = curr_grid.fit(X, y)
   curr_score = curr_grid.best_score_
   curr_grid.best_estimator_.model.save(
    '/data/models/model_{}_score_{}.h5'.format(str(curr_param_distributions), curr_score))
   del model
   del curr_grid
   del curr_grid_result
   K.clear_session()
第一次迭代运行时没有问题。 第二次迭代在第一个历元之后卡住(即,它正在打印1/257 1/257 1/257(为了示例,如果nb_epoch=257)),然后它就停止了

为什么会这样? 任何帮助都将不胜感激