Python 简单神经网络的实现_Python_Algorithm_Numpy_Matrix_Derivative

Python 简单神经网络的实现

python algorithm numpy matrix

Python 简单神经网络的实现,python,algorithm,numpy,matrix,derivative,Python,Algorithm,Numpy,Matrix,Derivative,我的头撞在这堵砖墙上已经有一段很长的时间了，但我似乎无法将我的头绕在砖墙上。我试图实现一个只使用numpy和矩阵乘法的自动编码器。不允许使用theano或keras技巧我将描述这个问题及其所有细节。它一开始有点复杂，因为有很多变量，但实际上非常简单我们所知道的 1） X是一个mbyn矩阵，它是我们的输入。输入是该矩阵的行。每个输入是一个n维行向量，我们有m 2）我们（单个）隐藏层中的神经元数量，即k 3）我们神经元的激活功能（乙状结肠，将被表示为g（x））及其衍生物g'（x）我们不知道

我的头撞在这堵砖墙上已经有一段很长的时间了，但我似乎无法将我的头绕在砖墙上。我试图实现一个只使用numpy和矩阵乘法的自动编码器。不允许使用theano或keras技巧

我将描述这个问题及其所有细节。它一开始有点复杂，因为有很多变量，但实际上非常简单

我们所知道的

1）

是一个

矩阵，它是我们的输入。输入是该矩阵的行。每个输入是一个

维行向量，我们有

2）我们（单个）隐藏层中的神经元数量，即

3）我们神经元的激活功能（乙状结肠，将被表示为

g（x）

）及其衍生物

g'（x）

我们不知道并想找到的内容

总的来说，我们的目标是找到6个矩阵：

w1

是

，

b1

是

，

w2

是

，b2是

，

w3

是

的

和

b3

是

的

它们是随机初始化的，我们使用梯度下降法找到最佳解

过程

整个过程是这样的

首先我们计算

z1=Xw1+b1

。它是

，是我们隐藏层的输入。然后我们计算

h1=g（z1）

，这只是将sigmoid函数应用于

z1

的所有元素。当然，它也是由

生成的

，是我们隐藏层的输出

然后，我们计算

z2=h1w2+b2

，它是

，由

计算，是我们神经网络输出层的输入。然后我们计算出

h2=g（z2）

，这自然也是

，是我们神经网络的输出

最后，我们获取这个输出并对其执行一些线性运算：

Xhat=h2w3+b3

，这也是

，这是我们的最终结果

我被困的地方

我想最小化的代价函数是均方误差。我已经用numpy代码实现了它

def cost(x, xhat):
    return (1.0/(2 * m)) * np.trace(np.dot(x-xhat,(x-xhat).T))

问题在于找到与

w1、b1、w2、b2、w3、b3

相关的成本导数。让我们把成本称为

在推导出自己的并对自己进行数字检查后，我确定了以下事实：
1）
dSdxhat=（1/m）*np.dot（xhat-x）
2）
dSdw3=np.dot（h2.T，dSdxhat）
3）
dSdb3=dSdxhat
4）
dSdh2=np.dot（dSdxhat，w3.T）
但我一辈子都搞不懂dSdz2。这是一堵砖墙
根据链式规则，应该是dSdz2=dSdh2*dh2dz2，但尺寸不匹配
计算S对z2的导数的公式是什么
编辑-这是我的代码，用于自动编码器的整个前馈操作

import numpy as np def g(x): #sigmoid activation functions return 1/(1+np.exp(-x)) #same shape as x! def gGradient(x): #gradient of sigmoid return g(x)*(1-g(x)) #same shape as x! def cost(x, xhat): #mean squared error between x the data and xhat the output of the machine return (1.0/(2 * m)) * np.trace(np.dot(x-xhat,(x-xhat).T)) #Just small random numbers so we can test that it's working small scale m = 5 #num of examples n = 2 #num of features in each example k = 2 #num of neurons in the hidden layer of the autoencoder x = np.random.rand(m, n) #the data, shape (m, n) w1 = np.random.rand(n, k) #weights from input layer to hidden layer, shape (n, k) b1 = np.random.rand(m, k) #bias term from input layer to hidden layer (m, k) z1 = np.dot(x,w1)+b1 #output of the input layer, shape (m, k) h1 = g(z1) #input of hidden layer, shape (m, k) w2 = np.random.rand(k, n) #weights from hidden layer to output layer of the autoencoder, shape (k, n) b2 = np.random.rand(m, n) #bias term from hidden layer to output layer of autoencoder, shape (m, n) z2 = np.dot(h1, w2)+b2 #output of the hidden layer, shape (m, n) h2 = g(z2) #Output of the entire autoencoder. The output layer of the autoencoder. shape (m, n) w3 = np.random.rand(n, n) #weights from output layer of autoencoder to entire output of the machine, shape (n, n) b3 = np.random.rand(m, n) #bias term from output layer of autoencoder to entire output of the machine, shape (m, n) xhat = np.dot(h2, w3)+b3 #the output of the machine, which hopefully resembles the original data x, shape (m, n)

好的，这里有一个建议。在向量的情况下，如果x是长度
n
的向量，那么
g（x）
也是长度
n
的向量。然而，
g'（x）
不是一个向量，它是，并且大小为
nxn
。类似地，在小批量情况下，其中X是大小为
mxn
的矩阵，
g（X）
是
mxn
，但
g'（X）
是
nxn
。尝试：

def gGradient(x): #gradient of sigmoid return np.dot(g(x).T, 1 - g(x))
@Paul说得对，偏差项应该是向量，而不是矩阵。你应该：

b1 = np.random.rand(k) #bias term from input layer to hidden layer (k,) b2 = np.random.rand(n) #bias term from hidden layer to output layer of autoencoder, shape (n,) b3 = np.random.rand(n) #bias term from output layer of autoencoder to entire output of the machine, shape (n,)
Numpy的广播意味着您不必更改
xhat
的计算
然后（我想！）你可以这样计算导数：

dSdxhat = (1/float(m)) * (xhat-x) dSdw3 = np.dot(h2.T,dSdxhat) dSdb3 = dSdxhat.mean(axis=0) dSdh2 = np.dot(dSdxhat, w3.T) dSdz2 = np.dot(dSdh2, gGradient(z2)) dSdb2 = dSdz2.mean(axis=0) dSdw2 = np.dot(h1.T,dSdz2) dSdh1 = np.dot(dSdz2, w2.T) dSdz1 = np.dot(dSdh1, gGradient(z1)) dSdb1 = dSdz1.mean(axis=0) dSdw1 = np.dot(x.T,dSdz1)
这对你有用吗
编辑
我已经决定，我根本不确定
gGradient
是否应该是一个矩阵。那么：

dSdxhat = (xhat-x) / m dSdw3 = np.dot(h2.T,dSdxhat) dSdb3 = dSdxhat.sum(axis=0) dSdh2 = np.dot(dSdxhat, w3.T) dSdz2 = h2 * (1-h2) * dSdh2 dSdb2 = dSdz2.sum(axis=0) dSdw2 = np.dot(h1.T,dSdz2) dSdh1 = np.dot(dSdz2, w2.T) dSdz1 = h1 * (1-h1) * dSdh1 dSdb1 = dSdz1.sum(axis=0) dSdw1 = np.dot(x.T,dSdz1)

你确定你的尺寸没有对齐仅仅是因为你在h2单位列表中添加了偏差单位吗？导数对我来说似乎很好，形状上的差异大于1，所以它不可能是偏差项。在我的导数中，我也考虑了偏差项。说到偏差项，通常情况下，你会应用一个恒定的偏差（并了解它的值），而不是每次迭代都使用一个不同的偏差（即，它们应该有形状（k）和（n））。当每个输入都发生变化时，你没有什么可以概括的。让我困惑的是，你显然有两个隐藏层，但说你只有一个。我认为如果你提供完整实现的代码会很有帮助-然后我们可以看到你在做些什么。啊，我会解释-就像图上说，我有原始的输入层（x），我使用一些投影和sigmoid转到隐藏层（h1），然后我再次这样做转到自动编码器的输出层（h2），然后我获取自动编码器的输出，并对其应用一些线性变换以得到整个机器的输出（xhat）所以h2是自动编码器的最后一层，但xhat只是线性变换的h2，但我们也需要学习这种变换。我将分享我的代码。问题1：保持偏差项不变。我会尝试，但我认为有一个问题。看看直线z1=np。点（x，w1）+b1。x和w1的点积是m乘以k，但b1是（根据你的说法）k乘1。添加它们是不可能的。也许你的意思是我应该复制这个k乘1向量m次？那么，意味着numpy会自动为你复制这个向量。哦，我明白了，非常感谢，我不知道。我一定会检查它并发布结果