Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/366.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么我的反向传播算法的性能被卡住了?_Python_Performance_Neural Network_Backpropagation - Fatal编程技术网

Python 为什么我的反向传播算法的性能被卡住了?

Python 为什么我的反向传播算法的性能被卡住了?,python,performance,neural-network,backpropagation,Python,Performance,Neural Network,Backpropagation,我正在学习如何编写神经网络,目前我正在研究一种带有一个输入层、一个隐藏层和一个输出层的反向传播算法。算法正在运行,当我抛出一些测试数据时 x_train = np.array([[1., 2., -3., 10.], [0.3, -7.8, 1., 2.]]) y_train = np.array([[10, -3, 6, 1], [1, 1, 6, 1]]) 在我的算法中,使用3个隐藏单位的默认值和10e-4的默认学习率 Backprop.train(x_train, y_train, to

我正在学习如何编写神经网络,目前我正在研究一种带有一个输入层、一个隐藏层和一个输出层的反向传播算法。算法正在运行,当我抛出一些测试数据时

x_train = np.array([[1., 2., -3., 10.], [0.3, -7.8, 1., 2.]])
y_train = np.array([[10, -3, 6, 1], [1, 1, 6, 1]])
在我的算法中,使用3个隐藏单位的默认值和10e-4的默认学习率

Backprop.train(x_train, y_train, tol = 10e-1)
x_pred = Backprop.predict(x_train),
我得到了很好的结果:

Tolerances: [10e-1, 10e-2, 10e-3, 10e-4, 10e-5]
Iterations: [2678, 5255, 7106, 14270, 38895]
Mean absolute error: [0.42540, 0.14577, 0.04264, 0.01735, 0.00773]
Sum of squared errors: [1.85383, 0.21345, 0.01882, 0.00311, 0.00071].
每次误差平方和都会像我预料的那样减少一个小数点。但是,当我使用这样的测试数据时

X_train = np.random.rand(20, 7)
Y_train = np.random.rand(20, 2)
Tolerances: [10e+1, 10e-0, 10e-1, 10e-2, 10e-3]
Iterations: [11, 19, 63, 80, 7931],
Mean absolute error: [0.30322, 0.25076, 0.25292, 0.24327, 0.24255],
Sum of squared errors: [4.69919, 3.43997, 3.50411, 3.38170, 3.16057],
没有什么真正改变。我检查了我的隐藏单位、梯度和权重矩阵,它们都不同,梯度确实在缩小,就像我设置的backprop算法一样

if ( np.sum(E_hidden**2) + np.sum(E_output**2) ) < tol: 
   learning = False,
if(np.sum(E_隐藏**2)+np.sum(E_输出**2))
其中E_hidden和E_output是我的梯度矩阵。我的问题是:尽管梯度正在缩小,但对于某些数据,指标实际上保持不变,这怎么可能?我能做些什么呢?

我的背撑像这样:

class Backprop:


    def sigmoid(r):
            return (1 + np.exp(-r)) ** (-1)

    def train(x_train, y_train, hidden_units = 3, learning_rate = 10e-4, tol = 10e-3):
        # We need y_train to be 2D. There should be as many rows as there are x_train vectors
        N = x_train.shape[0]
        I = x_train.shape[1]
        J = hidden_units 
        K = y_train.shape[1] # Number of output units

            # Add the bias units to x_train
        bias = -np.ones(N).reshape(-1,1) # Make it 2D so we can stack it
            # Make the row vector a column vector for easier use when applying matrices. Afterwards, x_train.shape = (N, I+1)
        x_train = np.hstack((x_train, bias)).T # x_train.shape = (I+1, N) -> N column vectors of respective length I+1
        
            # Create our weight matrices
        W_input = np.random.rand(J, I+1) # W_input.shape = (J, I+1)
        W_hidden = np.random.rand(K, J+1) # W_hidden.shape = (K, J+1)
        m = 0
        learning = True
        while learning:

            ##### ----- Phase 1: Forward Propagation ----- #####

                # Create the total input to the hidden units
            u_hidden = W_input @ x_train # u_hidden.shape = (J, N) -> N column vectors of respective length J. For every training vector we                                            # get J hidden states
                # Create the hidden units 
           
            h = Backprop.sigmoid(u_hidden) # h.shape = (J, N)
                # Create the total input to the output units
            
            bias = -np.ones(N)
            h = np.vstack((h, bias)) # h.shape = (J+1, N)
            u_output = W_hidden @ h # u_output.shape = (K, N). For every training vector we get K output states. 
                # In the code itself the following is not necessary, because, as we remember from the above, the output activation function
                # is the identity function, but let's do it anyway for the sake of clarity
            y_pred = u_output.copy() # Now, y_pred has the same shape as y_train
            
            
            ##### ----- Phase 2: Backward Propagation ----- #####

                # We will calculate the delta terms now and begin with the delta term of the output unit
                
                # We will transpose several times now. Before, having column vectors was convenient, because matrix multiplication is 
                # more intuitive then. But now, we need to work with indices and need the right dimensions. Yes, loops are inefficient,
                # they provide much more clarity so that we can easily connect the theory above with our code. 

                # We don't need the delta_output right now, because we will update W_hidden with a loop. But we need it for the delta term 
                # of the hidden unit.
            delta_output = y_pred.T - y_train 
                # Calculate our error gradient for the output units
            E_output = np.zeros((K, J+1))
            for k in range(K):
                for j in range(J+1):
                    for n in range(N):
                        E_output[k, j] += (y_pred.T[n, k] - y_train[n, k]) * h.T[n, j] 
                # Calculate our change in W_hidden
            W_delta_output = -learning_rate * E_output
                # Update the old weights
            W_hidden = W_hidden + W_delta_output

                # Let's calculate the delta term of the hidden unit
            delta_hidden = np.zeros((N, J+1))
            for n in range(N):
                for j in range(J+1):
                    for k in range(K):
                        delta_hidden[n, j] += h.T[n, j]*(1 - h.T[n, j]) * delta_output[n, k] * W_delta_output[k, j]

                # Calculate our error gradient for the hidden units, but exclude the hidden bias unit, because W_input and the hidden bias
                # unit don't share any relation at all
            E_hidden = np.zeros((J, I+1))
            for j in range(J):
                for i in range(I+1):
                    for n in range(N):
                        E_hidden[j, i] += delta_hidden[n, j]*x_train.T[n, i]
                # Calculate our change in W_input
            W_delta_hidden = -learning_rate * E_hidden
            W_input = W_input + W_delta_hidden
            
            if ( np.sum(E_hidden**2) + np.sum(E_output**2) ) < tol: 
               learning = False
            
            m += 1 # Iteration count
            
        Backprop.weights = [W_input, W_hidden]
        Backprop.iterations = m
        Backprop.errors = [E_hidden, E_output]


 ##### ----- #####


    def predict(x):
        N = x.shape[0]
            # x1 = Backprop.weights[1][:,:-1] @ Backprop.sigmoid(Backprop.weights[0][:,:-1] @ x.T) # Trying this we see we really need to add
            #  a bias here the bias if we also train using bias

            # Add the bias units to x
        bias = -np.ones(N).reshape(-1,1) # Make it 2D so we can stack it
            # Make the row vector a column vector for easier use when applying matrices.
        x = np.hstack((x, bias)).T
        h = Backprop.weights[0] @ x
        u = Backprop.sigmoid(h) # We need to transform the data using the sigmoidal function
        h = np.vstack((u, bias.reshape(1, -1)))

        return (Backprop.weights[1] @ h).T
class Backprop:
def乙状结肠(右):
返回值(1+np.exp(-r))**(-1)
def列(x列、y列、隐藏单元=3、学习率=10e-4、tol=10e-3):
#我们需要你们的火车是2D的。行的数量应与x_列向量的数量相同
N=x_列车形状[0]
I=x_列车形状[1]
J=隐藏单位
K=y_列形状[1]#输出单元数
#将偏差单位添加到x_列
偏差=-np.ones(N).重塑(-1,1)#使其为2D,以便我们可以堆叠它
#使行向量成为列向量,以便在应用矩阵时更易于使用。之后,x_train.shape=(N,I+1)
x#u train=np.hstack((x#train,bias)).T#x#u train.shape=(I+1,N)->各自长度I+1的N列向量
#创建我们的权重矩阵
W_input=np.random.rand(J,I+1)#W_input.shape=(J,I+1)
W_hidden=np.random.rand(K,J+1)#W_hidden.shape=(K,J+1)
m=0
学习=真实
在学习过程中:
#####----第1阶段:正向传播---#####
#创建隐藏单位的总输入
u_hidden=W_input@x_train#u_hidden.shape=(J,N)->N个长度分别为J的列向量。对于每个训练向量,我们#得到J个隐藏状态
#创建隐藏单位
h=Backprop.sigmoid(u#u隐藏)#h.shape=(J,N)
#创建输出单位的总输入
偏差=-np.one(N)
h=np.vstack((h,偏差))#h.shape=(J+1,N)
u_output=W_hidden@h#u_output.shape=(K,N)。对于每个训练向量,我们得到K个输出状态。
#在代码本身中,以下内容是不必要的,因为正如我们从上面所记得的,输出激活函数
#是身份函数,但为了清晰起见,还是让我们来做吧
y_pred=u_output.copy()#现在,y_pred与y_train具有相同的形状
#####-----第2阶段:反向传播------#####
#我们现在将计算增量项,并从输出单元的增量项开始
#我们现在将转置几次。以前,拥有列向量很方便,因为矩阵乘法是
#那就更直观了。但现在,我们需要使用指数,需要正确的维度。是的,循环效率很低,
#它们提供了更多的清晰性,因此我们可以轻松地将上述理论与代码联系起来。
#我们现在不需要delta_输出,因为我们将使用循环更新隐藏的W_。但我们需要它来表示delta项
#隐藏单位的名称。
delta_输出=y_pred.T-y_列
#计算输出单位的误差梯度
E_输出=np.零((K,J+1))
对于范围(k)内的k:
对于范围内的j(j+1):
对于范围内的n(n):
E_输出[k,j]+=(y_pred.T[n,k]-y_列[n,k])*h.T[n,j]
#计算我们在W_中的变化
W_delta_输出=-学习率*E_输出
#更新旧的权重
W_hidden=W_hidden+W_delta_输出
#让我们计算隐藏单位的增量项
delta_hidden=np.零((N,J+1))
对于范围内的n(n):
对于范围内的j(j+1):
对于范围(k)内的k:
delta_hidden[n,j]+=h.T[n,j]*(1-h.T[n,j])*delta_输出[n,k]*W_delta_输出[k,j]
#计算隐藏单位的误差梯度,但排除隐藏偏差单位,因为W_输入和隐藏偏差
#单位之间根本没有任何关系
E_hidden=np.零((J,I+1))
对于范围(j)内的j:
对于范围内的i(i+1):
对于范围内的n(n):
E_hidden[j,i]+=delta_hidden[n,j]*x_train.T[n,i]
#计算我们在W_输入中的变化
W_delta_hidden=-学习率*E_hidden
W_输入=W_输入+W_增量隐藏
如果(np.sum(E_隐藏**2)+np.sum(E_输出**2))