Python 期望值中的NaN即使被屏蔽,也会在权重矩阵中引入NaN
为了避免这种情况,我编写了以下模型并运行了它。下面给出了输出。为什么训练步骤以NaN期望值为基础,而NaN期望值被Python 期望值中的NaN即使被屏蔽,也会在权重矩阵中引入NaN,python,keras,nan,missing-data,Python,Keras,Nan,Missing Data,为了避免这种情况,我编写了以下模型并运行了它。下面给出了输出。为什么训练步骤以NaN期望值为基础,而NaN期望值被loss\u 0\u掩盖,其中_NaN(历史表明损失确实被评估为0.0),但在隐藏的和max\u min\u pred的权重矩阵中引入NaN权重?我首先认为这可能是单个参数学习与输出值的某种加权,我认为这可能是特定于Adadelta优化器的。但新加坡元也是如此 import keras from keras.models import Model from keras.optimiz
loss\u 0\u掩盖,其中_NaN
(历史表明损失确实被评估为0.0
),但在隐藏的和max\u min\u pred
的权重矩阵中引入NaN
权重?我首先认为这可能是单个参数学习与输出值的某种加权,我认为这可能是特定于Adadelta
优化器的。但新加坡元也是如此
import keras
from keras.models import Model
from keras.optimizers import Adadelta
from keras.losses import mean_squared_error
from keras.layers import Input, Dense
import tensorflow as tf
import numpy
def loss_0_where_nan(loss_function, msg=""):
def filtered_loss_function(y_true, y_pred):
with_nans = loss_function(y_true, y_pred)
nans = tf.is_nan(with_nans)
filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)
filtered = tf.Print(filtered,
[y_true, y_pred, nans, with_nans, filtered],
message=msg)
return filtered
return filtered_loss_function
input = Input(shape=(3,))
hidden = Dense(2)(input)
min_pred = Dense(1)(hidden)
max_min_pred = Dense(1)(hidden)
model = Model(inputs=[input],
outputs=[min_pred, max_min_pred])
model.compile(
optimizer=Adadelta(),
loss=[loss_0_where_nan(mean_squared_error, "aux: "),
loss_0_where_nan(mean_squared_error, "main: ")],
loss_weights=[0.2, 1.0])
def random_values(n, missing=False):
for i in range(n):
x = numpy.random.random(size=(2, 3))
_min = numpy.minimum(x[..., 0], x[..., 1])
if missing:
_max_min = numpy.full((len(x), 1), numpy.nan)
else:
_max_min = numpy.maximum(_min, x[..., 2]).reshape((-1, 1))
# print(x, numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min), sep="\n", end="\n\n")
yield x, [numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min)]
model.fit_generator(random_values(2, False),
steps_per_epoch=2,
verbose=False)
print("With missing")
history = model.fit_generator(random_values(1, True),
steps_per_epoch=1,
verbose=False)
print("Normal")
model.fit_generator(random_values(2, False),
steps_per_epoch=2,
verbose=False)
print(history.history)
输出:
main: [[0.29131493][0.769406676]][[-1.38235903][-3.32388687]][0 0][2.80118465 16.7550526][2.80118465 16.7550526]
aux: [[0.0422333851][0.0949674547]][[1.01466811][0.648737907]][0 0][0.945629239 0.306661695][0.945629239 0.306661695]
main: [[0.451149166][0.671600938]][[-2.46504498][-2.74316335]][0 0][8.50418854 11.6606159][8.50418854 11.6606159]
aux: [[0.451149166][0.355992794]][[0.893445313][0.917516708]][0 0][0.195625886 0.315309107][0.195625886 0.315309107]
With missing
aux: [[0.406784][0.44401589]][[0.852455556][1.23527527]][0 0][0.198623136 0.62609148][0.198623136 0.62609148]
main: [[nan][nan]][[-3.2140317][-2.22139478]][1 1][nan nan][0 0]
Normal
aux: [[0.490041673][0.00489727268]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.867286][0.949406743]][[nan][nan]][1 1][nan nan][0 0]
aux: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
{'loss': [0.08247146010398865], 'dense_1_loss': [0.41235730051994324], 'dense_2_loss': [0.0]}
这似乎是一个类似于abouttf.where()
的问题
当y\u true
为nan
时,filtered=tf的梯度。其中(nans,tf.zeros_like(带nans),带nans)
的计算方法与d/dw(filtered)=1*d/dw(tf.zeros_like)+0*d/dw(带nans)
类似。因为在这种情况下,d/dw(带_nans)
是nan
,所以最后的梯度是1*0+0*nan=nan
def filtered_loss_function(y_true, y_pred):
nans = tf.is_nan(y_true)
masked_y_true = tf.where(nans, y_pred, y_true)
filtered = loss_function(masked_y_true, y_pred)
return filtered
为了避免此问题,您可以将y\u-true
设置为y\u-pred
,而不是将nan
损失值设置为0
,以便在y\u-true
为nan时获得0损失值
def filtered_loss_function(y_true, y_pred):
nans = tf.is_nan(y_true)
masked_y_true = tf.where(nans, y_pred, y_true)
filtered = loss_function(masked_y_true, y_pred)
return filtered
由于filtered
不再依赖nan
值(这些值在进入损失函数之前被屏蔽),梯度将不再具有nan
s
>>> model.get_weights()
[array([[ 0.9761261 , -0.7472908 ],
[-0.12295872, 0.39413464],
[-0.16676795, 0.30844116]], dtype=float32),
array([-0.00581209, 0.00300716], dtype=float32),
array([[-0.31789184],
[-0.87912357]], dtype=float32),
array([0.00628144], dtype=float32),
array([[-1.0932552 ],
[ 0.11788104]], dtype=float32),
array([0.00575602], dtype=float32)]