Python Tensorflow“；model.evaluate（）；每次在同一数据集上运行时给出不同的结果_Python_Tensorflow_Keras_Evaluation_Tf.data.dataset

Python Tensorflow“；model.evaluate（）；每次在同一数据集上运行时给出不同的结果

python tensorflow keras

Python Tensorflow“；model.evaluate（）；每次在同一数据集上运行时给出不同的结果,python,tensorflow,keras,evaluation,tf.data.dataset,Python,Tensorflow,Keras,Evaluation,Tf.data.dataset,在同一验证集中多次在Tensorflow中运行model.evaluate时，我得到了不同的结果该模型包括数据增强层、EfficientNetB0基线和GlobalAveragePooling层（见下文）。我使用tf.data pipeline从数据帧的张量切片加载验证数据集，它没有被洗牌，因此顺序总是相同的 def get_custom_model(input_shape, saved_model_path=None, training_base_model=True): input

在同一验证集中多次在Tensorflow中运行model.evaluate时，我得到了不同的结果

该模型包括数据增强层、EfficientNetB0基线和GlobalAveragePooling层（见下文）。我使用tf.data pipeline从数据帧的张量切片加载验证数据集，它没有被洗牌，因此顺序总是相同的

def get_custom_model(input_shape, saved_model_path=None, training_base_model=True):
    input_layer = Input(shape=input_shape)

    data_augmentation = RandomFlip('horizontal')(input_layer, training=False)
    data_augmentation = RandomRotation(factor=(-0.2, 0.2))(data_augmentation, training=False)
    data_augmentation = RandomZoom(height_factor=(-0.2, 0.2))(data_augmentation, training=False)
    data_augmentation = RandomCrop(width = input_shape[0], height = input_shape[1](data_augmentation, training=False)

    baseline_model = EfficientNetB0(include_top=False, weights='imagenet')
    baseline_model.trainable = training_base_model # Added for bsg hypertuning

    baseline_output = baseline_model(data_augmentation, training=training_base_model)
    baseline_output = GlobalAveragePooling2D()(baseline_output)
    attributes_output = Dense(units=228, activation='sigmoid', name='attributes_output')(baseline_output)

    model = Model(inputs=[input_layer], outputs=[attributes_output])

    # Load weights
    if saved_model_path != None: 
        model.load_weights(saved_model_path)#.expect_partial()        
    
    return model

我知道如果我再次训练模型，结果可能会有所不同，因为某些层是用随机权重初始化的，但我希望对同一模型的评估是相等的。我正在使用相同的保存的模型路径运行方法get_custom_model，以便每次模型加载相同的权重（之前保存的权重）

我用来比较的不同指标是损失、准确度和召回率，以防它们可能相关。优化器是rmsprop和损失二进制交叉熵。此外，我还尝试将training_base_模型更改为False，并且度量值要差得多（几乎像随机权重）

PS：同样在培训期间，我使用了相同的验证集来获得验证指标并保存其中的最佳权重，但当我再次加载最佳权重时，结果并不相同。例如，在训练阶段的验证过程中，我可以获得81.28%的精度，然后在加载这些权重和执行model.evaluate（）时获得57%的精度。

您的数据增强函数的名称中都有“random”一词，因此您的模型每次都可能在不同的数据上运行，这可能解释了不同的结果。谢谢你的评论，但是默认情况下TensorFlow的随机层只在训练期间应用。我将训练指定为False，以便它们在推理时间运行。（来源：）您必须包含求值代码，包括要求值的多个调用及其生成的结果。