Python 神经网络收敛于答案，然后剧烈振荡_Python_Machine Learning_Neural Network_Gradient Descent

Python 神经网络收敛于答案，然后剧烈振荡

python machine-learning neural-network

Python 神经网络收敛于答案，然后剧烈振荡,python,machine-learning,neural-network,gradient-descent,Python,Machine Learning,Neural Network,Gradient Descent,为了理解完全连通的ANN，我从一个简单的二维线性回归示例开始我的网络很简单——一个输入层和一个输出层之间有一组权重。如果我的理解是正确的，那么权重基本上应该知道我的数据的最佳拟合线的斜率我的训练数据是一条有点嘈杂的线，如下所示。这只是一条斜率为m=0.5的模糊线。我的代码提取一个点，通过我的网络传播它，然后反向传播以更新权重。重量更新发生在每个基准面上，或在每5、10或20个点上进行平均。我的权重是随机的，但为了在调试时保持理智，我已经固定了种子。当我绘制每个训练示例的平方误差图时，我得

为了理解完全连通的ANN，我从一个简单的二维线性回归示例开始

我的网络很简单——一个输入层和一个输出层之间有一组权重。如果我的理解是正确的，那么权重基本上应该知道我的数据的最佳拟合线的斜率

我的训练数据是一条有点嘈杂的线，如下所示。这只是一条斜率为m=0.5的模糊线。我的代码提取一个点，通过我的网络传播它，然后反向传播以更新权重。重量更新发生在每个基准面上，或在每5、10或20个点上进行平均。

我的权重是随机的，但为了在调试时保持理智，我已经固定了种子。当我绘制每个训练示例的平方误差图时，我得到了下面的驼峰，沉降，然后爆炸

对应于平方误差的减少，我的算法找到线性数据的斜率…然后猛烈地拒绝它，lol

我最初的想法是，我编码到训练数据中的振荡太过剧烈，可能会使解偏离局部极小值。但随后我收紧了我的训练数据中y=.5x的分布，但效果不大。为了消除这些影响，我实现了平均随机权重更新，只在一批训练样本之后更新。没有爱

我还使用了一个非常小的学习率（.0005），因为我认为噪音可能会让我在斜坡上振荡。这最初是有帮助的，但这些数字是alpha=.005的结果

对我遗漏的东西有什么建议吗？我想处理这种情况，这样我就可以进行多元回归

import random
import matplotlib.pyplot as plt
import numpy as np

random.seed(0)


def gen_linear_regression_data(num_points, slope=.5, var=1.0, plot=False, seed=None):
    if seed is not None:
        np.random.seed(seed)
    data = [idx for idx in range(num_points)]
    labels = [data[idx] * slope for idx in range(num_points)]
    # add noise
    labels = [l + np.random.uniform(-var, var) for l in labels]
    if plot:
        plt.scatter(data, labels)
        plt.show()
    return data, labels


class Sigmoid():
    def activate(self, x):
        return 1 / (1 + np.exp(-x))

    def backtivate(self, x):
        return np.multiply(x, (1 - x))


class Passive():
    def activate(self, x):
        return x

    def backtivate(self, x):
        return 1


class Layer():

    def __init__(self, values, activation="logistic"):
        if not (isinstance(values, list) or isinstance(values, np.ndarray)):
            values = [values]
        self.values = np.matrix(values)
        if self.values.shape[-1] > self.values.shape[0]:
            self.values = self.values.reshape((self.values.shape[-1], 1))
        self.set_activation(activation_str=activation)

    def __getitem__(self, item):
        return self.values[item]

    def __setitem__(self, key, value):
        if not (isinstance(value, int) or isinstance(value, float)):
            raise TypeError("Layer values must be int or float.")
        self.values[key] = value

    def __len__(self):
        return self.values.shape[0]

    def __str__(self):
        return "\n".join([str(val) for val in self.values])

    def __mul__(self, other):
        return np.dot(other, self.values)

    def set_activation(self, activation_str):
        if activation_str == "logistic":
            self.activation = Sigmoid()
        elif activation_str == "passive":
            self.activation = Passive()

    def transpose(self):
        return self.values.reshape(len(self), 1)

    def activate(self):
        return self.activation.activate(self.values)

    def backtivate(self):
        return self.activation.backtivate(self.values)


class DataSet():
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels
        self.data_dict = [{"data": d, "label": l} for (d, l) in zip(self.data, self.labels)]

    def __getitem__(self, item):
        return self.data_dict[item]


class Weights():

    def __init__(self, weights):
        if not isinstance(weights, list) and not all([isinstance(weight, np.ndarray) for weight in weights]):
            raise TypeError("Blah.")
        self.data = weights

    def __len__(self):
        return sum([w.shape[0] * w.shape[1] for w in self.data])

    def __getitem__(self, item):
        weight_idx = np.cumsum([w.shape[0] * w.shape[1] - 1 for w in self.data])
        desired_idx = 0
        for idx, w_idx in enumerate(weight_idx):
            if item < w_idx:
                desired_idx = idx
                break

        if idx > 0:
            cs = np.cumsum(weight_idx)
            another_idx = item - cs[idx - 1]
        else:
            another_idx = item

        self.data[desired_idx][another_idx]


class Network():

    def __init__(self, network_config,
                 first_layer=None,
                 random_weights=True,
                 learning_rate=.0005):

        self.network_config = network_config

        if first_layer is None:
            first_layer = Layer(np.zeros(network_config[0]["layers"]))
        first_layer.set_activation(network_config[0]["activation"])

        # Initialize layers
        self.depth = len(network_config)
        self.layers = [first_layer]

        self.layers.extend(
                [Layer(np.zeros(config["layers"]),
                       activation=config["activation"]) for config in network_config[1:]])

        # Initialize learning rate
        self.learning_rate = learning_rate

        # Initialize weights
        self.weights = []
        for layer_idx in range(self.depth - 1):
            if random_weights:
                self.weights.append(2 * (np.random.rand(len(self[layer_idx + 1]), len(self[layer_idx])) - .5))
            else:
                self.weights.append(np.ones((len(self[layer_idx + 1]), len(self[layer_idx]))))

    def __getitem__(self, item):
        return self.layers[item]

    def __str__(self):
        max_elems = np.max([len(layer) for layer in self.layers])
        matrix = [[str(layer[elem]) if elem < len(layer) else None for layer in self.layers] for elem in
                  range(max_elems)]
        net_str = "\n".join([str(layer) for layer in matrix])
        weight_str = str(self.weights)
        try:
            deltas_str = " ".join([str(lay.delta) for lay in self.layers])
        except:
            deltas_str = ""
        return "Net:\n%s\n\nWeights:\n%s\n\nDeltas:\n%s" % (net_str, weight_str, deltas_str)

    def forward_prop(self, debug=False):
        for layer_idx in range(1, self.depth):
            ww = self.weights[layer_idx - 1]
            layer = self.layers[layer_idx - 1]
            weighted_input = np.dot(ww, layer.values)
            self.layers[layer_idx].values = weighted_input
            self.layers[layer_idx].values = self.layers[layer_idx].activate()
            if debug:
                print("-------------")
                print(self)
        return self.layers[-1]

    def back_prop(self, answer, debug=False):
        def calc_deltas():
            for layer_idx, layer in enumerate(reversed(self.layers)):
                if layer_idx == 0:
                    # Calculate dE for Squared Error
                    outputs = self.layers[-1]
                    dE = self.layers[-1][0] - answer
                    square_error = dE ** 2
                    a = dE
                else:
                    a = np.dot(self.weights[-layer_idx].T,
                               self.layers[-layer_idx].delta)
                b = layer.backtivate()
                layer_delta = np.multiply(a, b)
                layer.delta = layer_delta
            return square_error

        def calc_dws():
            dws = []
            deltas = [l.delta for l in self.layers]
            values = [l.values for l in self.layers]
            for layer_idx, layer in enumerate(self.layers[:-1]):
                dws.append(np.multiply(deltas[layer_idx + 1], values[layer_idx].T))
            return dws

        print("Answer:\n%f" % answer)
        square_error = calc_deltas()
        dws = calc_dws()
        return dws, square_error

    def set_inputs(self, layer):
        self.layers[0] = layer
        self.layers[0].set_activation(self.network_config[0]["activation"])


def build_network_config(layers):
    network_config = []
    for n_idx, neurons in enumerate(layers):
        if n_idx != len(layers) - 1:
            network_config.append({"layers": neurons, "activation": "passive"})
        else:
            network_config.append({"layers": neurons, "activation": "passive"})
    return network_config


disp_el = 200
batch_size = 1
samples = 300
learning_rate = .00005

# Setup the network
network_config = build_network_config([1, 1])
net = Network(network_config=network_config,
              random_weights=True,
              learning_rate=learning_rate)

# Pull in the labeled data set.
dataset = DataSet(*gen_linear_regression_data(samples, seed=0, plot=True))
errs = []

cum_sum_dws = np.zeros_like(net.weights)
weights = []
for idx, dset in enumerate(dataset):
    initial_layer = Layer(dset["data"])
    net.set_inputs(initial_layer)

    # Feed forward
    net.forward_prop(True)

    # Back propagate error
    dws, sq_err = net.back_prop(dset["label"], debug=True)
    errs.append(sq_err)

    # Update weights
    if idx % batch_size == 0 and idx != 0:
        cum_sum_dws += dws
        cum_sum_dws /= batch_size
        new_weights = [net.weights[idx] - net.learning_rate * cum_sum_dws[idx] for idx in range(len(cum_sum_dws))]
        net.weights = new_weights
        weights.append(new_weights)
        print("dws:\n%s" % str([-net.learning_rate * cum_sum_dws[idx] for idx in range(len(dws))]))
        cum_sum_dws = np.zeros_like(dws)
    else:
        cum_sum_dws += dws

plt.scatter(range(len(errs[:disp_el])), errs[:disp_el])
plt.show()
plt.scatter(range(len(weights[:disp_el])), weights[:disp_el])
plt.show()
ww = Weights(net.weights)

# Validate data set
dataset = DataSet(*gen_linear_regression_data(5, seed=0))
for dset in dataset:
    initial_layer = Layer(dset["data"])
    net.set_inputs(initial_layer)
    prediction = net.forward_prop()
    print("data: %s, prediction: %s" % (str(dset["data"]), prediction))

随机导入
将matplotlib.pyplot作为plt导入
将numpy作为np导入
随机种子（0）
def gen_线性回归_数据（点数，斜率=0.5，变量=1.0，绘图=False，种子=None）：
如果种子不是无：
np.随机种子（种子）
数据=[范围内idx的idx（num_点）]
标签=[数据[idx]*idx在范围内的斜率（num_点）]
#增加噪音
标签=[l+np.random.uniform（-var，var）表示标签中的l]
如果绘图：
plt.散射（数据、标签）
plt.show（）
返回数据、标签
类Sigmoid（）：
def激活（自我，x）：
返回1/（1+np.exp（-x））
def反驱动（自身，x）：
返回np.乘法（x，（1-x））
类被动（）：
def激活（自我，x）：
返回x
def反驱动（自身，x）：
返回1
类层（）：
定义初始化（自我、值、激活=“逻辑”）：
如果不是（isinstance（值，列表）或isinstance（值，np.ndarray））：
值=[值]
self.values=np.matrix（值）
如果self.values.shape[-1]>self.values.shape[0]：
self.values=self.values.reformate（（self.values.shape[-1]，1））
self.set\u激活（激活\u str=activation）
定义获取项目（自身，项目）：
返回自我值[项目]
定义设置项（自身、键、值）：
如果不是（isinstance（value，int）或isinstance（value，float））：
raise TypeError（“图层值必须为int或float。”）
self.values[键]=值
定义（自我）：
返回self.values.shape[0]
定义（自我）：
返回“\n”.join（[str（val）表示self.values中的val]）
定义多个（自身、其他）：
返回np.点（其他、自身值）
def set_激活（自身、激活_str）：
如果激活\u str==“逻辑”：
self.activation=Sigmoid（）
elif激活\u str==“被动”：
self.activation=被动（）
def转置（自）：
返回self.values.reformate（len（self），1）
def激活（自）：
返回self.activation.activate（self.values）
def反驱动（自）：
返回self.activation.backtivate（self.values）
类数据集（）：
定义初始化（自身、数据、标签）：
self.data=数据
self.labels=标签
self.data_dict=[{“data”：d，“label”：l}表示zip中的（d，l）（self.data，self.labels）]
定义获取项目（自身，项目）：
返回自我数据目录[项目]
类权重（）：
定义初始值（自身，重量）：
如果不是isinstance（重量，列表）和not all（重量中的重量的isinstance（重量，np.ndarray））：
raise TypeError（“废话”）
self.data=权重
定义（自我）：
返回和（[w.shape[0]*w.shape[1]表示self.data中的w]）
定义获取项目（自身，项目）：
权重_idx=np.cumsum（[w.shape[0]*w.shape[1]-1表示自数据中的w]）
所需的_idx=0
对于idx，枚举中的w_idx（权重_idx）：
如果项目0：
cs=np.cumsum（重量×idx）
另一项=项目-cs[idx-1]
其他：
另一个_idx=项目
self.data[desired_idx][另一个_idx]
类网络（）：
定义初始化（自身、网络配置、，
第一层=无，
随机加权=真，
学习率=.0005）：
self.network\u config=网络配置
如果第一层为“无”：
第一层=层（np.zero（网络配置[0][“层]））
第一层。设置激活（网络配置[0][“激活”]）
#初始化层
self.depth=len（网络配置）
self.layers=[第一层]
self.layers.extend(
[图层（np.Zero（配置[“图层]），
网络中配置的activation=config[“activation”]_config[1:]）
#初始化学习速率
自学习率=学习率
#初始化权重
自重=[]
对于范围内的层_idx（self.depth-1）：
如果随机加权：
self.weights.append（2*（np.random.rand（len（self[layer\u idx+1]），len（self[layer\u idx]））-.5））
其他：
self.weights.append（np.one（（len（self[layer\u idx+1]），len（self[layer\u idx]））