Python 神经网络异或门分类
我写了一个简单的神经网络,可以预测异或门函数。我想我的计算是正确的,但是损失没有下降,仍然接近0.6。有人能帮我找到原因吗Python 神经网络异或门分类,python,neural-network,backpropagation,Python,Neural Network,Backpropagation,我写了一个简单的神经网络,可以预测异或门函数。我想我的计算是正确的,但是损失没有下降,仍然接近0.6。有人能帮我找到原因吗 import numpy as np import matplotlib as plt train_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T train_Y = np.array([[0,1,1,0]]) test_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T test_Y = np.arra
import numpy as np
import matplotlib as plt
train_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T
train_Y = np.array([[0,1,1,0]])
test_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T
test_Y = np.array([[0,1,1,0]])
learning_rate = 0.1
S = 5
def sigmoid(z):
return 1/(1+np.exp(-z))
def sigmoid_derivative(z):
return sigmoid(z)*(1-sigmoid(z))
S0, S1, S2 = 2, 5, 1
m = 4
w1 = np.random.randn(S1, S0) * 0.01
b1 = np.zeros((S1, 1))
w2 = np.random.randn(S2, S1) * 0.01
b2 = np.zeros((S2, 1))
for i in range(1000000):
Z1 = np.dot(w1, train_X) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(w2, A1) + b2
A2 = sigmoid(Z2)
J = np.sum(-train_Y * np.log(A2) + (train_Y-1) * np.log(1-A2)) / m
dZ2 = A2 - train_Y
dW2 = np.dot(dZ2, A1.T) / m
dB2 = np.sum(dZ2, axis = 1, keepdims = True) / m
dZ1 = np.dot(w2.T, dZ2) * sigmoid_derivative(Z1)
dW1 = np.dot(dZ1, train_X.T) / m
dB1 = np.sum(dZ1, axis = 1, keepdims = True) / m
w1 = w1 - dW1 * 0.03
w2 = w2 - dW2 * 0.03
b1 = b1 - dB1 * 0.03
b2 = b2 - dB2 * 0.03
print(J)
我认为你的dZ2是不正确的,因为你没有把它和sigmoid的导数相乘
对于异或问题,如果检查输出,则1略高于0.5,0略低于0。我相信这是因为搜索工作已经达到了一个平台,因此进展非常缓慢。我试了一下,它很快收敛到几乎为0。我还尝试了一个伪二阶算法,它几乎在我使用时立即收敛。我在下面展示RMSPprop的绘图
此外,网络的最终输出现在是
[[1.67096234e-06 9.99949419E-01 9.99994158e-01 6.87836337e-06]]
四舍五入
数组[[0,1,1,0.]]
但是,我强烈建议执行此操作,以确保分析梯度与数值计算的梯度匹配。另见
我将修改后的代码添加到RMSProp实现中
#!/usr/bin/python3
import numpy as np
import matplotlib.pyplot as plt
train_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T
train_Y = np.array([[0,1,1,0]])
test_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T
test_Y = np.array([[0,1,1,0]])
learning_rate = 0.1
S = 5
def sigmoid(z):
return 1/(1+np.exp(-z))
def sigmoid_derivative(z):
return sigmoid(z)*(1-sigmoid(z))
S0, S1, S2 = 2, 5, 1
m = 4
w1 = np.random.randn(S1, S0) * 0.01
b1 = np.zeros((S1, 1))
w2 = np.random.randn(S2, S1) * 0.01
b2 = np.zeros((S2, 1))
# RMSProp variables
dWsqsum1 = np.zeros_like (w1)
dWsqsum2 = np.zeros_like (w2)
dBsqsum1 = np.zeros_like (b1)
dBsqsum2 = np.zeros_like (b2)
alpha = 0.9
lr = 0.01
err_vec = list ();
for i in range(20000):
Z1 = np.dot(w1, train_X) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(w2, A1) + b2
A2 = sigmoid(Z2)
J = np.sum(-train_Y * np.log(A2) + (train_Y-1) * np.log(1-A2)) / m
dZ2 = (A2 - train_Y) * sigmoid_derivative (Z2);
dW2 = np.dot(dZ2, A1.T) / m
dB2 = np.sum(dZ2, axis = 1, keepdims = True) / m
dZ1 = np.dot(w2.T, dZ2) * sigmoid_derivative(Z1)
dW1 = np.dot(dZ1, train_X.T) / m
dB1 = np.sum(dZ1, axis = 1, keepdims = True) / m
# RMSProp update
dWsqsum1 = alpha * dWsqsum1 + (1 - learning_rate) * np.square (dW1);
dWsqsum2 = alpha * dWsqsum2 + (1 - learning_rate) * np.square (dW2);
dBsqsum1 = alpha * dBsqsum1 + (1 - learning_rate) * np.square (dB1);
dBsqsum2 = alpha * dBsqsum2 + (1 - learning_rate) * np.square (dB2);
w1 = w1 - (lr * dW1 / (np.sqrt (dWsqsum1) + 10e-10));
w2 = w2 - (lr * dW2 / (np.sqrt (dWsqsum2) + 10e-10));
b1 = b1 - (lr * dB1 / (np.sqrt (dBsqsum1) + 10e-10));
b2 = b2 - (lr * dB2 / (np.sqrt (dBsqsum2) + 10e-10));
print(J)
err_vec.append (J);
Z1 = np.dot(w1, train_X) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(w2, A1) + b2
A2 = sigmoid(Z2)
print ("\n", A2);
plt.plot (np.array (err_vec));
plt.show ();
欢迎来到SO,谢谢发帖!您可以对问题进行一些细化,以帮助受访者了解如何提供帮助。你能指出你为什么认为这个数学是正确的吗?查看J输出的内容也会有所帮助。背撑的关键部件是什么?它们在你的代码中吗?谢谢你的回答!!实际上,我的问题是初始化的权重太小。另外,我认为我对dZ2的实现是正确的,因为dZ2是最后一层的导数,所以你不必乘以sigmoid_导数。确实,你需要对权重进行适当的初始化。比如说,你正在学习一个单一的感知机与乙状结肠输出。如果你找到了均方代价函数的导数,你最终会找到sigmoid或任何其他阈值函数的导数。因此需要阈值导数。另外,正如我提到的,确保使用梯度检查确认梯度计算。