Numpy 用多层感知器求解三点异或

Numpy 用多层感知器求解三点异或,numpy,neural-network,deep-learning,xor,perceptron,Numpy,Neural Network,Deep Learning,Xor,Perceptron,已知XOR问题是由多层感知器解决的。给定所有4个布尔输入和输出,它训练并记忆重现I/O所需的权重 import numpy as np np.random.seed(0) def sigmoid(x): # Returns values that sums to one. return 1 / (1 + np.exp(-x)) def sigmoid_derivative(sx): # See https://math.stackexchange.com/a/1225116

已知XOR问题是由多层感知器解决的。给定所有4个布尔输入和输出,它训练并记忆重现I/O所需的权重

import numpy as np
np.random.seed(0)

def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(sx):
    # See https://math.stackexchange.com/a/1225116
    return sx * (1 - sx)

# Cost functions.
def cost(predicted, truth):
    return truth - predicted

xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
xor_output = np.array([[0,1,1,0]]).T

X = xor_input
Y = xor_output

# Define the shape of the weight vector.
num_data, input_dim = X.shape
# Lets set the dimensions for the intermediate layer.
hidden_dim = 5
# Initialize weights between the input layers and the hidden layer.
W1 = np.random.random((input_dim, hidden_dim))

# Define the shape of the output vector. 
output_dim = len(Y.T)
# Initialize weights between the hidden layers and the output layer.
W2 = np.random.random((hidden_dim, output_dim))

num_epochs = 10000
learning_rate = 1.0

for epoch_n in range(num_epochs):
    layer0 = X
    # Forward propagation.

    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(layer0, W1))
    layer2 = sigmoid(np.dot(layer1, W2))

    # Back propagation (Y -> layer2)

    # How much did we miss in the predictions?
    layer2_error = cost(layer2, Y)
    # In what direction is the target value?
    # Were we really close? If so, don't change too much.
    layer2_delta = layer2_error * sigmoid_derivative(layer2)


    # Back propagation (layer2 -> layer1)
    # How much did each layer1 value contribute to the layer2 error (according to the weights)?
    layer1_error = np.dot(layer2_delta, W2.T)
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # update weights
    W2 +=  learning_rate * np.dot(layer1.T, layer2_delta)
    W1 +=  learning_rate * np.dot(layer0.T, layer1_delta)
我们看到,我们已经对网络进行了全面培训,以记忆XOR的输出:

# On the training data
[int(prediction > 0.5) for prediction in layer2] 
[out]:

[0, 1, 1, 0]
0 [0]
1 [1]
1 [1]
0 [0]
0 [0]
1 [1]
1 [1]
1 [0]
[0 0] 1 [0]
[0 1] 1 [1]
[1 0] 1 [1]
[1 1] 0 [0]
如果我们重新输入相同的输入,我们会得到相同的输出:

for x, y in zip(X, Y):
    layer1_prediction = sigmoid(np.dot(W1.T, x)) # Feed the unseen input into trained W.
    prediction = layer2_prediction = sigmoid(np.dot(W2.T, layer1_prediction)) # Feed the unseen input into trained W.
    print(int(prediction > 0.5), y)
[out]:

[0, 1, 1, 0]
0 [0]
1 [1]
1 [1]
0 [0]
0 [0]
1 [1]
1 [1]
1 [0]
[0 0] 1 [0]
[0 1] 1 [1]
[1 0] 1 [1]
[1 1] 0 [0]
但是如果我们在没有任何一个数据点的情况下重新训练参数W1和W2,即

xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
xor_output = np.array([[0,1,1,0]]).T
让我们删除最后一行数据,并将其用作看不见的测试。 对于相同代码的其余部分,无论我如何更改超参数,它都无法学习XOR函数并重现I/O

for x, y in zip(xor_input, xor_output):
    layer1_prediction = sigmoid(np.dot(W1.T, x)) # Feed the unseen input into trained W.
    prediction = layer2_prediction = sigmoid(np.dot(W2.T, layer1_prediction)) # Feed the unseen input into trained W.
    print(int(prediction > 0.5), y)
[out]:

[0, 1, 1, 0]
0 [0]
1 [1]
1 [1]
0 [0]
0 [0]
1 [1]
1 [1]
1 [0]
[0 0] 1 [0]
[0 1] 1 [1]
[1 0] 1 [1]
[1 1] 0 [0]
即使我们洗牌输入/输出: 我们无法完全训练XOR函数:'

for x, y in zip(xor_input, xor_output):
    layer1_prediction = sigmoid(np.dot(W1.T, x)) # Feed the unseen input into trained W.
    prediction = layer2_prediction = sigmoid(np.dot(W2.T, layer1_prediction)) # Feed the unseen input into trained W.
    print(x, int(prediction > 0.5), y)
[out]:

[0, 1, 1, 0]
0 [0]
1 [1]
1 [1]
0 [0]
0 [0]
1 [1]
1 [1]
1 [0]
[0 0] 1 [0]
[0 1] 1 [1]
[1 0] 1 [1]
[1 1] 0 [0]
因此,当文献表明多层感知器(又称基本深度学习)解决了XOR问题时,这是否意味着它可以完全学习和记忆给定全套输入/输出的权重,但如果缺少一个数据点,就不能概括XOR问题


以下是Kaggle数据集的链接,回答者可以自己测试网络:

我认为学习泛化异或和记忆异或是不同的事情

两层感知器可以记忆XOR,正如您所看到的,即存在一个权重组合,其中损失最小且等于0绝对最小值

如果权重是随机初始化的,那么最终可能会出现这样的情况,即您实际上已经学习了XOR,而不仅仅是记忆

请注意,多层感知器是非凸函数,因此可能存在多个极小值甚至多个全局极小值。当数据缺少一个输入时,存在多个最小值,所有值都相等,并且存在可以正确分类缺少点的最小值。因此,MLP可以学习XOR。尽管缺少一点,要找到这种重量组合可能很困难


人们经常认为,神经网络是一种通用的函数逼近器,甚至可以逼近无意义的标签。从这个角度来看,你可能想看看这篇关于我喜欢尤亚夫观点的相关文章。这有点不可能