Python Keras model.fit和model.PREDICTION在二元分类过程中对完全相同的数据得出了截然不同的结果
使用固定的随机种子,我洗牌我的训练数据并生成Python Keras model.fit和model.PREDICTION在二元分类过程中对完全相同的数据得出了截然不同的结果,python,tensorflow,machine-learning,keras,neural-network,Python,Tensorflow,Machine Learning,Keras,Neural Network,使用固定的随机种子,我洗牌我的训练数据并生成x\u列,x\u有效,y\u列,y\u有效,然后使用这些新的分割数据集调用model.fit() 我对y_有效数据的AUC和准确性在历元进度输出中相当好: # final validation loss, AUC, accuracy respectively (x_valid, y_valid): Validation Scores: [0.23666608333587646, 0.9644553661346436, 0.8915975689888]
x\u列,x\u有效,y\u列,y\u有效
,然后使用这些新的分割数据集调用model.fit()
我对y_有效数据的AUC和准确性在历元进度输出中相当好:
# final validation loss, AUC, accuracy respectively (x_valid, y_valid):
Validation Scores: [0.23666608333587646, 0.9644553661346436, 0.8915975689888]
但是,我决定使用另一个库绘制混淆矩阵,因此我决定调用y\u preds=model.predict(x\u valid)
,希望结果与使用model.fit()时看到的一样。我大错特错了:
y_true = pd.Series(y_valid)
y_preds = pd.Series(model.predict(x_valid).squeeze())
print(y_true.value_counts(), `\n`)
print(y_preds.value_counts())
导致
# True label value counts
1.0 140000
0.0 101120
dtype: int64
# Predicted label value counts
0.0 241119
1.0 1
dtype: int64
显然,来自model.fit()
的验证分数并非基于这些可怕的预测,而是基于完全相同的数据。发生了什么事
型号的完整代码:
class ModelWrapper():
def __init__(self, name, transformer, loss=keras.losses.BinaryCrossentropy(), auc=True):
self.name = name
self.loss = loss
self.metrics = [keras.metrics.AUC(), 'accuracy'] if auc else ['accuracy']
self.transformer = transformer
self.model = Sequential([
BatchNormalization(input_shape=input_shape),
Dense(150, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(150, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(100, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(100, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(100, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(50, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(50, activation='relu'), BatchNormalization(), #Dropout(0.5),
Dense(1, activation='sigmoid')])
self.x_train, self.x_valid, self.y_train, self.y_valid = transformer(x_train, x_valid, y_train, y_valid)
self.model.compile(loss=self.loss, metrics=self.metrics, optimizer=keras.optimizers.Adam(lr=3e-4))
def fit(self):
np.random.seed(124)
print(f"Training {self.name}")
history = self.model.fit(self.x_train, self.y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(self.x_valid, self.y_valid),
callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=6, verbose=0),
keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.15, patience=3)
], workers=16, use_multiprocessing=True)
score = self.model.evaluate(self.x_valid, self.y_valid, verbose=0)
print('Validation Scores:', score)
model_dir = "data"
# save training history
hist_df = pd.DataFrame(history.history)
with open('data/history_99e.json', 'w') as f:
hist_df.to_json(f)
# Save model structure.
with open((f"{model_dir}/{self.name}.json"), "w") as json_file:
json_file.write(self.model.to_json())
# save model parameters
self.model.save(f"{model_dir}/{self.name}.h5")
return history
def load(self):
model_dir = "data"
with open((f"{model_dir}/{self.name}.json"), "r") as json_file:
self.model = model_from_json(json_file.read())
self.model.load_weights(f"{model_dir}/{self.name}.h5")
def predict(self, x):
x = self.model.predict(x=self.transformer.x_scaler.transform(x), verbose=0)
return x