Python Keras自动编码器：将重量从编码器绑定到解码器不起作用_Python_Tensorflow_Deep Learning_Autoencoder_Tf.keras

Python Keras自动编码器：将重量从编码器绑定到解码器不起作用

python tensorflow deep-learning

Python Keras自动编码器：将重量从编码器绑定到解码器不起作用,python,tensorflow,deep-learning,autoencoder,tf.keras,Python,Tensorflow,Deep Learning,Autoencoder,Tf.keras,我创建了一个自动编码器作为我的一个卡格尔比赛的完整模型的一部分。我正试着把编码器的重量绑起来，转置到解码器上。在第一个历元之前，权重是正确同步的，在那之后，解码器权重只是冻结，并不能跟上由梯度下降更新的编码器权重我在谷歌上找到的几乎每一篇关于这个问题的帖子上都找了12个小时，似乎没有人能回答我的问题。最接近的一个是这个，但是问题是通过不使用变量张量作为内核解决的，但是我已经不使用这种类型的张量作为我的解码器内核，所以没有用我使用的是本文中定义的DensetiedKeras自定义图层类，完全相

我创建了一个自动编码器作为我的一个卡格尔比赛的完整模型的一部分。我正试着把编码器的重量绑起来，转置到解码器上。在第一个历元之前，权重是正确同步的，在那之后，解码器权重只是冻结，并不能跟上由梯度下降更新的编码器权重

我在谷歌上找到的几乎每一篇关于这个问题的帖子上都找了12个小时，似乎没有人能回答我的问题。最接近的一个是这个，但是问题是通过不使用变量张量作为内核解决的，但是我已经不使用这种类型的张量作为我的解码器内核，所以没有用

我使用的是本文中定义的DensetiedKeras自定义图层类，完全相同，只需更改引用Keras的方式，以适合我的导入风格

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

这是自定义图层定义

class DenseTied(tf.keras.layers.Layer):

    def __init__(self, units,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 tied_to=None,
                 **kwargs):
        self.tied_to = tied_to
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = tf.keras.initializers.get(kernel_initializer)
        self.bias_initializer = tf.keras.initializers.get(bias_initializer)
        self.kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer)
        self.bias_regularizer = tf.keras.regularizers.get(bias_regularizer)
        self.activity_regularizer = tf.keras.regularizers.get(activity_regularizer)
        self.kernel_constraint = tf.keras.constraints.get(kernel_constraint)
        self.bias_constraint = tf.keras.constraints.get(bias_constraint)
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=2)
        self.supports_masking = True

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]

        if self.tied_to is not None:
            self.kernel = tf.keras.backend.transpose(self.tied_to.kernel)
            self.non_trainable_weights.append(self.kernel)
        else:
            self.kernel = self.add_weight(shape=(input_dim, self.units),
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=2, axes={-1: input_dim})
        self.built = True

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 2
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

    def call(self, inputs):
        output = tf.keras.backend.dot(inputs, self.kernel)
        if self.use_bias:
            output = tf.keras.backend.bias_add(output, self.bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output

该模型使用虚拟数据集进行训练和测试

rand_samples = np.random.rand(16, 51)
dummy_ds = tf.data.Dataset.from_tensor_slices((rand_samples, rand_samples)).shuffle(16).batch(16)

encoder = tf.keras.layers.Dense(1, activation="linear", input_shape=(51,), use_bias=True)
decoder = DenseTied(51, activation="linear", tied_to=encoder, use_bias=True)

autoencoder = tf.keras.Sequential()
autoencoder.add(encoder)
autoencoder.add(decoder)

autoencoder.compile(metrics=['accuracy'],
                    loss='mean_squared_error',
                    optimizer='sgd')

autoencoder.summary()

print("Encoder Kernel Before 1 Epoch", encoder.kernel[0])
print("Decoder Kernel Before 1 Epoch", decoder.kernel[0][0])

autoencoder.fit(dummy_ds, epochs=1)

print("Encoder Kernel After 1 Epoch", encoder.kernel[0])
print("Decoder Kernel After 1 Epoch", decoder.kernel[0][0])

预期的输出是在第一个元素中两个内核完全相同（为了简单起见，只打印一个权重）

当前输出显示解码器内核没有像转置编码器内核一样更新

2019-09-06 14:55:42.070003: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-09-06 14:55:42.984580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
2019-09-06 14:55:43.088109: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.        
2019-09-06 14:55:43.166145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-06 14:55:43.203865: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-09-06 14:55:43.277988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
2019-09-06 14:55:43.300888: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.        
2019-09-06 14:55:43.309040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-06 14:55:44.077814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-06 14:55:44.094542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-06 14:55:44.099411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-06 14:55:44.103424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4712 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 1)                 52
_________________________________________________________________
dense_tied (DenseTied)       (None, 51)                103
=================================================================
Total params: 103
Trainable params: 103
Non-trainable params: 0
_________________________________________________________________
Encoder Kernel Before 1 Epoch tf.Tensor([0.20486075], shape=(1,), dtype=float32)
Decoder Kernel Before 1 Epoch tf.Tensor(0.20486075, shape=(), dtype=float32)
1/1 [==============================] - 1s 657ms/step - loss: 0.3396 - accuracy: 0.0000e+00
Encoder Kernel After 1 Epoch tf.Tensor([0.20530733], shape=(1,), dtype=float32)
Decoder Kernel After 1 Epoch tf.Tensor(0.20486075, shape=(), dtype=float32)
PS C:\Users\whitm\Desktop\CodeProjects\ForestClassifier-DEC>

我看不出我做错了什么。

砝码没有系好。您只是使用第一层的转置权重初始化绑定层的权重，然后从不训练它们

transpose

返回一个新的张量/不同的对象，并且

add\u weight

创建一个新变量，因此在

build

之后，两个层之间的任何关系都会丢失。我认为最好这样做：

def call(self, inputs):
    output = tf.keras.backend.dot(inputs, tf.keras.backend.transpose(self.tied_to.kernel))
    if self.use_bias:
        output = tf.keras.backend.bias_add(output, self.tied_to.bias, data_format='channels_last')
    if self.activation is not None:
        output = self.activation(output)
    return output

这里，绑定层始终明确使用第一层的权重，并且本身不会有任何权重（即从

构建中删除添加权重部分）。
要绑定权重，我建议使用允许共享层的。也就是说，这里有一个替代实现，它将编码器和解码器之间的权重联系起来：
class TransposableDense(tf.keras.layers.Dense):

    def __init__(self, units, **kwargs):
        super().__init__(units, **kwargs)

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]
        self.t_output_dim = input_dim

        self.kernel = self.add_weight(shape=(int(input_dim), self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
            self.bias_t = self.add_weight(shape=(input_dim,),
                                          initializer=self.bias_initializer,
                                          name='bias_t',
                                          regularizer=self.bias_regularizer,
                                          constraint=self.bias_constraint)
        else:
            self.bias = None
            self.bias_t = None
        # self.input_spec = tf.keras.layers.InputSpec(min_ndim=2, axes={-1: input_dim})
        self.built = True

    def call(self, inputs, transpose=False):
        bs, input_dim = inputs.get_shape()

        kernel = self.kernel
        bias = self.bias
        if transpose:
            assert input_dim == self.units
            kernel = tf.keras.backend.transpose(kernel)
            bias = self.bias_t

        output = tf.keras.backend.dot(inputs, kernel)
        if self.use_bias:
            output = tf.keras.backend.bias_add(output, bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output

    def compute_output_shape(self, input_shape):
        bs, input_dim = input_shape
        output_dim = self.units
        if input_dim == self.units:
            output_dim = self.t_output_dim
        return bs, output_dim

这个密集层的内核可以通过使用transpose=True
调用该层进行转置。请注意，这可能会破坏一些基本的Keras原则（例如，层具有多个输出形状），但它应该适用于您的情况

下面的示例显示了如何使用它定义模型：
a = tf.keras.layers.Input((51,))
dense = TransposableDense(1, activation='linear', use_bias=True)
encoder_out = dense(a)
decoder_out = dense(encoder_out, transpose=True)
encoder = tf.keras.Model(a, encoder_out)
autoencoder = tf.keras.Model(a, decoder_out)

我已经试过了，我知道这个解决方案，但是，如果是这样的话，为什么会有大量的文章和文章提出与我展示的完全相同的自定义层呢？他们都错了吗？你是对的，我有点误读了你的代码；实际上，您并没有在绑定的情况下创建新权重。恐怕我现在没有时间进一步研究这个问题，但我希望稍后能更新我的答案。你有没有试着在TDS文章中逐行运行代码？我试着运行本文中的代码，一次训练一个历元，并检查编码器和解码器的权重是否相等。他们是匹配的。我建议尝试使用大小大于1的编码器进行健全性测试。我已经测试了不同大小的编码器，我将1用于简单性。在复制示例中，我将模型仅用于训练一个Epocht。这是一个最小的复制示例，我的全自动编码器稍微复杂一点。我将测试此解决方案并使其适应我的全模型，我会告诉你，这并不完全是我阅读的文章所采用的原始方法，但这是一个非常聪明的方法，自动编码器正在工作，并且权重具有一种结构，允许在自动编码器序列完成时保存并加载到Keras密集层中（这最后一个仍然有待确认，但我的直觉告诉我这是可能的）。这有利于在下一步开发完整模型时摆脱这个自定义类