Numpy MLP输出任何输入的所有训练输出的平均值

Numpy MLP输出任何输入的所有训练输出的平均值,numpy,neural-network,gradient-descent,backpropagation,mlp,Numpy,Neural Network,Gradient Descent,Backpropagation,Mlp,我已经尝试实现了一个带有S形激活的多层感知器。代码如下: import numpy as np def sigmoid(x): return 1.0/(1.0 + np.exp(-x)) def sigmoid_derivative(x): return sigmoid(x) * (1.0 - sigmoid(x)) class MLP: def __init__(self, layers, x_train, y_train): self.layers

我已经尝试实现了一个带有S形激活的多层感知器。代码如下:

import numpy as np

def sigmoid(x):
    return 1.0/(1.0 + np.exp(-x))
def sigmoid_derivative(x):
    return sigmoid(x) * (1.0 - sigmoid(x))

class MLP:
    def __init__(self, layers, x_train, y_train):
        self.layers = layers
        self.inputs = x_train
        self.outputs = y_train

    def forward(self, input):
        output = input
        for layer in self.layers:
            layer.activations = output
            output = layer.feedforward(output)
        return output

    def backward(self, output, predicted):
        error = np.multiply(2 * np.subtract(output, predicted), sigmoid_derivative(predicted))
        for layer in self.layers[::-1]:
            #recursively backpropagate the error
            error = layer.backpropagate(error)
    def train(self):
        for i in range(1,500):
                predicted = self.forward(self.inputs)
                self.backward(self.outputs,predicted)
    def test(self, input):
        return self.forward(input)



class Layer:
    def __init__(self, prevNodes, selfNodes):
        self.weights = np.random.rand(prevNodes,selfNodes)
        self.biases = np.zeros(selfNodes)
        self.activations = np.array([])

    def feedforward(self, input):
        return sigmoid(np.dot(input, self.weights) + self.biases)

    def backpropagate(self, error):
        delPropagate = np.dot(error, self.weights.transpose())

        dw = np.dot(self.activations.transpose(), error)
        db = error.mean(axis=0) * self.activations.shape[0]
        self.weights = self.weights + 0.1 * dw
        self.biases = self.biases + 0.1 * db
        return np.multiply(delPropagate ,sigmoid_derivative(self.activations))

layer1 = Layer(3,4)
layer2 = Layer(4,1)

x_train = np.array([[0,0,1],[0,1,1],[1,0,1],[1,1,1]])
y_train = np.array([[0],[1],[1],[0]])
x_test = np.array([[0,0,1]])
mlp = MLP([layer1,layer2], x_train, y_train)
mlp.train()
mlp.test(x_test)
然而,问题是网络饱和,输出任何输入的所有训练输出的平均值。例如,在上述情况下,y_序列平均值约为0.5,无论我向网络提供什么“test_x”值,输出值始终在0.5左右


代码中的问题在哪里。我在算法中遗漏了什么。感谢帮助

问题似乎在于迭代次数较少,将迭代次数从500次增加到50000次,或将学习率更改为0.5次,也适用于迭代次数较少的情况。矩阵运算和所有的数学似乎是一致的