Python Keras-具有多个输出的自定义损失函数的实现_Python_Tensorflow_Keras_Loss Function

Python Keras-具有多个输出的自定义损失函数的实现

python tensorflow keras

Python Keras-具有多个输出的自定义损失函数的实现,python,tensorflow,keras,loss-function,Python,Tensorflow,Keras,Loss Function,我正在尝试复制AlphaGo Zero系统（一种更小的版本）。然而，在网络模型中，我遇到了一个问题。我应该实现的损失函数如下所示：其中： z是两个网络头之一的标签（介于-1和1之间的实际值），v是网络预测的该值 pi是所有动作的分布概率标签，p是网络预测的所有动作的分布概率 c是L2正则化参数我将通道列表（表示游戏状态）和数组（与pi和p大小相同）传递给网络，表示哪些动作确实有效（如果有效，则放入1，否则放入0）如您所见，损耗函数使用目标和网络预测进行计算。但是经过广泛的搜索，在实现

我正在尝试复制AlphaGo Zero系统（一种更小的版本）。然而，在网络模型中，我遇到了一个问题。我应该实现的损失函数如下所示：

其中：

z是两个网络头之一的标签（介于-1和1之间的实际值），v是网络预测的该值
pi是所有动作的分布概率标签，p是网络预测的所有动作的分布概率
c是L2正则化参数

我将通道列表（表示游戏状态）和数组（与pi和p大小相同）传递给网络，表示哪些动作确实有效（如果有效，则放入

，否则放入

）

如您所见，损耗函数使用目标和网络预测进行计算。但是经过广泛的搜索，在实现我的自定义损失函数时，我只能作为参数

y\u true

和

y\u pred

传递，即使我有两个“y\u true”和两个“y\u pred”。我曾尝试使用索引来获取这些值，但我很确定它不起作用

网络建模和自定义损耗函数的代码如下所示：

def custom_loss(y_true, y_pred):

    # I am pretty sure this does not work

    output_prob_dist = y_pred[0]
    output_value = y_pred[1] 
    label_prob_dist = y_true[0]
    label_value = y_pred[1]

    mse_loss = K.mean(K.square(label_value - output_value), axis=-1)
    cross_entropy_loss = K.dot(K.transpose(label_prob_dist), output_prob_dist)

    return mse_loss - cross_entropy_loss

def define_model():
    """Neural Network model implementation using Keras + Tensorflow."""
    state_channels = Input(shape = (5,5,6), name='States_Channels_Input')
    valid_actions_dist = Input(shape = (32,), name='Valid_Actions_Input')

    conv = Conv2D(filters=10, kernel_size=2, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='Conv_Layer')(state_channels)
    pool = MaxPooling2D(pool_size=(2, 2), name='Pooling_Layer')(conv)
    flat = Flatten(name='Flatten_Layer')(pool)

    # Merge of the flattened channels (after pooling) and the valid action
    # distribution. Used only as input in the probability distribution head.
    merge = concatenate([flat, valid_actions_dist])

    #Probability distribution over actions
    hidden_fc_prob_dist_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_1')(merge)
    hidden_fc_prob_dist_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_2')(hidden_fc_prob_dist_1)
    output_prob_dist = Dense(32, kernel_regularizer=regularizers.l2(0.0001), activation='softmax', name='Output_Dist')(hidden_fc_prob_dist_2)

    #Value of a state
    hidden_fc_value_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_1')(flat)
    hidden_fc_value_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_2')(hidden_fc_value_1)
    output_value = Dense(1, kernel_regularizer=regularizers.l2(0.0001), activation='tanh', name='Output_Value')(hidden_fc_value_2)

    model = Model(inputs=[state_channels, valid_actions_dist], outputs=[output_prob_dist, output_value])

    model.compile(loss=custom_loss, optimizer='adam', metrics=['accuracy'])

    return model



# In the main method
model = define_model()
# ...
# MCTS routine to collect the data for the network input
# ...

x_train = [channels_input, valid_actions_dist_input]
y_train = [dist_probs_label, who_won_label]

model.fit(x_train, y_train, epochs=10)

简言之，我的问题是：如何正确实现这个同时使用网络输出和网络标签值的自定义丢失功能？

我检查了他们的git，发现有很多问题；如等式所示，最终损耗是三种不同损耗的组合，三个网络将该最终损耗降至最低。他们的损失代码如下：

    # train ops
    policy_cost = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=tf.stop_gradient(labels['pi_tensor'])))

    value_cost = params['value_cost_weight'] * tf.reduce_mean(
    tf.square(value_output - labels['value_tensor']))

   reg_vars = [v for v in tf.trainable_variables()
            if 'bias' not in v.name and 'beta' not in v.name]
   l2_cost = params['l2_strength'] * \
   tf.add_n([tf.nn.l2_loss(v) for v in reg_vars])

   combined_cost = policy_cost + value_cost + l2_cost

您可以参考并进行相应的更改。

您是否尝试过为不同的目标实现两个不同的损失函数，然后

model.compile（损失={custom\u loss1，custom\u loss2}，optimizer='adam'，metrics=['accurity']）

。顺便说一下，图像没有显示。@Abdirahman图像在这里显示得很好（在三种不同的设备中）。关于你的建议，我已经考虑过了。然而，本文给出的损失函数是均方误差和交叉熵的减法。如果我单独实现它们，权重更新肯定会有所不同。还是我错了？我能得到报纸的链接吗？