在Python中自己实现的神经网络在数据拟合方面严重不足_Python_Numpy_Machine Learning_Neural Network_Deep Learning

在Python中自己实现的神经网络在数据拟合方面严重不足

python numpy machine-learning neural-network deep-learning

在Python中自己实现的神经网络在数据拟合方面严重不足,python,numpy,machine-learning,neural-network,deep-learning,Python,Numpy,Machine Learning,Neural Network,Deep Learning,我对机器/深度学习比较陌生。我在使用API（如Scikit-Learn、Tensor flow和Keras）开发监督学习模型方面有丰富的经验。所以，我想自己实现一个，以获得更好的体验我试图实现一个基本的深层神经网络算法来解决我自己的分类问题。我在这个测试中使用了一个iris数据集，但是，我的实现给了我非常差的结果，它严重地欠拟合数据，我得到的最佳精度是66%，最低甚至达到0%，对于我的算法的每一次运行，我得到的结果变化很大，即使我设置了一个低随机性种子我选择了一个tanh激活函数，学习率为0

我对机器/深度学习比较陌生。我在使用API（如Scikit-Learn、Tensor flow和Keras）开发监督学习模型方面有丰富的经验。所以，我想自己实现一个，以获得更好的体验

我试图实现一个基本的深层神经网络算法来解决我自己的分类问题。我在这个测试中使用了一个iris数据集，但是，我的实现给了我非常差的结果，它严重地欠拟合数据，我得到的最佳精度是66%，最低甚至达到0%，对于我的算法的每一次运行，我得到的结果变化很大，即使我设置了一个低随机性种子

我选择了一个tanh激活函数，学习率为0.01，输出层为softmax激活，输入变量为标准标量归一化

所以，我想知道我是否做了任何数学部分的错误，或者遗漏了这个算法的任何基本部分，如果有人能运行这个代码并指导我可能的更改，我将非常感激。事先非常感谢你

代码如下：

data = load_iris()

X = data.data

y = data.target

class Neural_Network:


def __init__(self, n_hlayers, n_nodes, lr):

#No. of hidden layers
  self.n_layers = n_hlayers

#No. of nodes in each of the hidden layer
  self.n_nodes = n_nodes

#Learning rate of the algorithm
 self.lr = lr

# Dictionary to hold the node values of all the layers
  self.layers = { }

# Dictionary to hold the weight values of all the layers
  self.weights = { }

def _softmax(self,values):

'''Function to perform softmax activation on the node values

 returns probabilities of each feature'''

   exp_scores = np.exp(values)

   probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

   return probs

def _derivate_tanh(self,values):

'''Function that performs derivative of a tanh activation function'''

  #Derivative of tanh is 1 - tanh^2 x
    return (1 - np.power(values, 2))

def fit(self,X,y):

'''This function constructs a Neural Network with given hyper parameters and then runs it for 

given no. of epochs. No. of nodes in all the hidden layers are the same for simplicity's sake.

returns: None / NA'''
  print('Fitting the data ')

  try:
      X = np.array(X)
      y = np.array(y)

 except:
     print('Could not make sense of the inputs')

# No. of examples and the dimensions of each sample
  self.num_examples, self.features = X.shape

  #Setting default layers

    #Input layer
       self.layers['input'] = np.zeros(shape=[1,self.features])

     #Hidden layers
       for i in range(1, (self.n_layers+ 1 )):

        self.layers['layer-1' + str(i)] = np.zeros(shape=[1,self.n_nodes])


        #Output layer
      self.layers['output'] = np.zeros(shape=[1, len(np.unique(y))    ])


         #Setting random weights

         for i in range(1, (self.n_layers+2)):

          #Weights for first layer
            if i == 1:
             self.weights['weight-1' + str(i)] = np.random.uniform(low=0.1, high = 0.2, size=[self.features, self.n_nodes])

       #Weights for hidden layer
           elif i < (self.n_layers+1): 
              self.weights['weight-1' + str(i)] = np.random.uniform(low = 
     0.1, high = 0.2, size=[self.n_nodes, self.n_nodes])

    #Weights for output layer
    else:
        self.weights['weight-1' + str(i)] = np.random.uniform(low = 0.1, high = 0.2, size = [self.n_nodes, len(np.unique(y))])


#no. of epochs taken from the user
epochs = int( input('Please choose no.of epochs: '))

#Standard Scaler to normalize the input data
S_s = StandardScaler()


self.X = S_s.fit_transform(X)

self.y = y.reshape(self.num_examples, 1)


for ep in range(epochs):


    #Forward propogate on 
    self._Forward_Propogate()

    if ep % 100 == 0:

        #Calculating the accuracy of the predictions
        self. acc = np.sum (self.y.flatten() == np.argmax( self.layers['output'], axis = 1) ) / self.num_examples

        print('Accuracy in epoch ', ep, ' is :', self.acc)

    #Backward propogating
    self._Backward_Propogation()

def _Forward_Propogate(self):

    '''This functions performs forward propogation on the input data through the hidden layers and on the output layer

    activations: tanh for all layers except the output layer

    returns: None/NA.'''

    #Feeding the input layer the normalized inputs
    self.layers['input'] = self.X

    #Forward propogating
    for i in range(1, len(self.layers.keys())):

        #Input Layer dot-product with first set of weights
        if i == 1:
            dp = self.layers['input'].dot(self.weights['weight-1' + str(i)])

            #Storing the result in first hidden layer after performing tanh activation on values
            self.layers['layer-1' + str(i)] = np.tanh(dp)

        #Hidden Layers dot-product with weights for the hidden layer
        elif i != (len(self.layers.keys())-1):

            dp = self.layers['layer-1' + str(i-1)]. dot(self.weights['weight-1' + str(i)])

            #Storing the result in next hidden layer after performing tanh activation on values
            self.layers['layer-1'+ str(i)] = np.tanh(dp)

        # dot-product of last hidden layer with last set of weights    
        else:

            dp = self.layers['layer-1' + str(i-1)].dot(self.weights['weight-1' + str(i)])

            #Storing the result in the output layerafter performing softmax activation on the values
            self.layers['output'] = self._softmax(dp)

def _Backward_Propogation(self):

    '''This function performs back propogation using normal/ naive gradient descent algorithm on the weights of the output layer

    through the hidden layer until the input layer weights

    returns:None/NA'''

    #Dictionary to hold Delta / Error values of each layer
    self.delta = {}

    #Dictionary to hold Gradient / Slope values of each layer
    self.gradients = {}


    #Calculating the error
    error = self.y - self.layers['output']

    #Adjusting weights of the network starting from weights of the output layer
    for i in reversed( range( 1, len(self.weights.keys())  +1   ) ):

        #Adjusting weights for the last layer
        if i == len(self.weights.keys()):


            #Delta for the output layer weights
            self.delta['delta_out'] = error * self.lr

            #Gradient or slope for the last layer's weights
            self.gradients['grad_out'] = self.layers['layer-1' + str(i-1)].T.dot(

                                self.delta['delta_out'])

            #Adjusting the original weights for the output layer
            self.weights['weight-1' + str(i)] = self.weights['weight-1' + str(i)] - (

                                self.lr * self.gradients['grad_out'])


        #Adjusting weights for last but one layer
        elif i == len(self.weights.keys()) - 1:

            # Delta / error values of the first hidden layer weights seen from the output layer
            self.delta['delta_1' + str(i)] = self.delta['delta_out'].dot( 

                        self.weights['weight-1' + str(i+1)].T ) * self._derivate_tanh(self.layers['layer-1' + str(i)])

            # Gradient / Slope for the weights of the first hidden layer seen from the output layer
            self.gradients['grad_1' + str(i) ] = self.layers['layer-1' + str(i-1)].T.dot( 

                                                        self.delta['delta_1' + str(i)])


            #Adjusting weights of the last but one layer
            self.weights['weight-1' + str(i)] = self.weights['weight-1' + str(i)] - (

                                                        self.lr * self.gradients['grad_1' + str(i)])

        #Adjusting weights for all other hidden layers
        elif i > 1:


            #Delta / Error values for the weights in the hidden layers
            self.delta['delta_1' + str(i)] = self.delta['delta_1' + str(i+1)].dot(

                self.weights['weight-1' + str(i+1)]) * self._derivate_tanh(self.layers['layer-1' + str(i)])

            #Gradient / Slope values for the weights of hidden layers
            self.gradients['grad_1' + str(i)] = self.layers['layer-1' + str(i-1)].T.dot(

                self.delta['delta_1' + str(i)])

            #Adjusting weights of the hidden layer
            self.weights['weight-1' + str(i)] = self.weights['weight-1' + str(i)] - (

                                                self.lr * self.gradients['grad_1' + str(i)])

        #Adjusting weights which are matrix-multipled with the input layer   
        else:


            # Delta / Error values for the weights that come after the input layer
            self.delta['delta_inp'] = self.delta['delta_1' + str(i+1)].dot(

                self.weights['weight-1' + str(i+1)]) * self._derivate_tanh( self.layers['layer-1' + str(i)])

            #Gradient / Slope values for the weights that come after the input layer
            self.gradients['grad_1' + str(i)] = self.layers['input'].T.dot(self.delta['delta_inp'])

            #Adjusting weights
            self.weights['weight-1' + str(i)] = self.weights['weight-1' + str(i)] - (

                                                self.lr * self.gradients['grad_1' + str(i)])

我的输出层的节点值（来自softmax激活的概率值）在大多数情况下都非常极端，它们上升到e^-37；我知道，不应该是这样的，当我检查输出层的权重时，它们没有那么极端，换句话说，它们没有改变到错过局部极小值的程度。所以，我不知道问题出在哪里。再一次，如果有人能运行此程序并检查问题，我将非常感激

谢谢

我认为这不是关于不合适，你应该更仔细地检查你的代码。这里有一些建议

1.输出层的增量错误

error = self.y - self.layers['output']

它应该是

yHat-y

，我想你不需要把它乘以这里的学习率

self.delta['delta_out'] = error * self.lr

2.注意形状。在我看来，你忘了在这里（或者其他地方）转置重量

建议：尝试为每一层使用不同的

n_节点

，在这种情况下，您将立即得到广播错误

3.在计算所有增量后更新权重

self.delta['delta_1' + str(i)] = self.delta['delta_out'].dot( 
    self.weights['weight-1' + str(i+1)].T ) * self._derivate_tanh(self.layers['layer-1' + str(i)])

self.weights['weight-1'+str（i+1）]

已在上一个循环中更新，我认为这是不正确的

试着在测试中使用更小的学习率和更大的EPCH

否决票？？好吧，如果你否决了我的问题，我想你们是这方面的专家。。那么，你介意给我指点一下如何解决这个问题，而不是简单地回避我的问题吗？仅供参考，数据科学上的同一个问题。stackexchange有大约4张赞成票和8条评论。我没有投你反对票，但你的问题离题了。@Smallches你能详细说明我的问题离题的原因吗？？我想我的标签可以很好地解释这与机器学习有关，不是吗？？我尽了最大努力解释我试图通过代码实现的目标，并用文本对其进行总结。我不能确切地理解为什么这样一个真正的尝试去理解一个复杂的算法是不被鼓励的。。。我可以理解，如果人们说它不是太清楚或有点太多要求运行代码等。。。但是，这不是StackOverflow的本质吗？帮助在某个领域没有天赋或初学者？非常感谢您的详细解释和投入。。。我想听所有这些。我将尝试你的所有建议，我对输出层的da delta感到困惑，很少有教程说明我在输出层中使用了“实际-预测”的usd da符号，所以我使用了dem。我尝试了一个较小的勒宁率（0.001）；在avrg dis上，我得到了更好的结果，我还发现我的权重初始化也限制了算法的结果，即，我只允许从0.0到0.1的均匀权重；当我将它们更改为-1.0到1.0时，我获得了更好的结果。我在查找增量时故意不转置隐藏层的权重，在我的情况下，da隐藏层中的所有da节点都相同，我不必转置da权重。。我在da隐藏层中的所有权重都有da相同的形状，你完全正确dat dis策略不适用于节点数不同的隐藏层。。

self.delta['delta_1' + str(i)] = self.delta['delta_1' + str(i+1)].dot(
    self.weights['weight-1' + str(i+1)]) * self._derivate_tanh(self.layers['layer-1' + str(i)])

self.delta['delta_1' + str(i)] = self.delta['delta_out'].dot( 
    self.weights['weight-1' + str(i+1)].T ) * self._derivate_tanh(self.layers['layer-1' + str(i)])