Tensorflow L1正则化不';t强制权重参数变为零

Tensorflow L1正则化不';t强制权重参数变为零,tensorflow,neural-network,Tensorflow,Neural Network,我的目标是通过SGD培训自动编码器。通过使用tensorflow 1.x,我为我的损失函数添加了L1正则化,如下所示: ........ ........ beta = 10e-3 n_inputs = X_Train.shape[1] n_outputs = n_inputs X = tf.placeholder(tf.float32, shape=[None, n_inputs]) weights1 = tf.get_varia

我的目标是通过SGD培训自动编码器。通过使用tensorflow 1.x,我为我的损失函数添加了L1正则化,如下所示:

    ........
    ........

    beta = 10e-3

    n_inputs = X_Train.shape[1]
    n_outputs = n_inputs

    X = tf.placeholder(tf.float32, shape=[None, n_inputs])

    weights1 = tf.get_variable("weights1", shape=[n_inputs, n_hidden], dtype=tf.float32, initializer = tf.contrib.layers.variance_scaling_initializer())  
    weights2 = tf.get_variable("weights2", shape=[n_hidden, n_outputs], dtype=tf.float32, initializer = tf.contrib.layers.variance_scaling_initializer())

    biases1 = tf.get_variable("biases1", shape=[n_hidden], initializer = tf.zeros_initializer())
    biases2 = tf.get_variable("biases2", shape=[n_outputs], initializer = tf.zeros_initializer())

    hidden = activation(tf.matmul(X, weights1) + biases1)
    outputs = tf.matmul(hidden, weights2) + biases2 

    reconstruction_loss = tf.reduce_mean(tf.square(outputs - X))
    reg_loss = beta * (tf.reduce_sum(tf.abs(weights1)) + tf.reduce_sum(tf.abs(weights2))) 

    loss = reconstruction_loss + reg_loss 

    training_op = tf.train.AdamOptimizer(learning_rate).optimizer.minimize(loss)

    init = tf.global_variables_initializer()
    .......
    .......

训练后,我计算了权重矩阵中的零个数。我发现所有的重量都是1[I][j]≠ 0有什么问题吗

L1正则化使网络将无用权重推向0,但它不会用蛮力将0分配给它们。它们将接近0,但不在0

见:


对于详细的答案

谢谢,我知道L2正则化会迫使权重参数趋于零(但永远不会精确为零)。但是L1正则化迫使权重参数变为零,这是一种正则化,可以修改损失(以及梯度);它不会“强迫”任何东西。您是否尝试过增加测试版?这将使正则化更强大,并可能导致期望的结果。这很复杂,但总的来说,权重被推到了零,但很难精确地达到0,我编辑了我的答案,添加了一个问题的链接,其中有L1正则化(β=10e-2)的原因w1:[[9.1070933e-06-1.8132891e-05 2.7440405e-05…9.9110139e-06-1.9736017e-06-6.1299079e-06][-3.8850612e-06-8.6495111e-06 5.6615418e-06…1.3068163e-05-1.1937200e-06 1.9667727e-05]…无L1正规化w1:[0.04745267 0.04075645 0.61595…0.002738][ 0.25440112 -0.40187 -0.22634292 ... 0.16455233 0.0246956 -0.24465105]…我如何鼓励稀疏性?然而,在这种情况下,L1不能保证我如何使用tensorflow 1.x的活动正则化?我相信通过添加/减去delta=学习率*正则化常数,您的参数将在零附近振荡。如果参数为正,且小于delta,则它将变为负,并且再次小于三角洲,然后返回。