Python 如何避免死区ReLU和NaN输出值?

Python 如何避免死区ReLU和NaN输出值?,python,tensorflow,machine-learning,neural-network,Python,Tensorflow,Machine Learning,Neural Network,我最近接到一项任务,要求我从头开始用Python编写一个密集的神经网络。我们应该在使用Sigmoid、Tanh和ReLU激活函数时解决一些回归问题。然而,尽管我的网络确实适用于基于分类的问题,但每当我使用回归网络时,我都会遇到一些问题 首先,如果我使用这个数据库:(我需要用这个数据测试我的网络),每当我尝试训练然后预测,同时使用ReLU作为激活函数时,所有输出都是0。我和我的教授谈过,他说也许我可以尝试使用Leaky ReLU,但如果我这样做,我会得到NaN值。我已经使用非常小的学习率进行训练,

我最近接到一项任务,要求我从头开始用Python编写一个密集的神经网络。我们应该在使用Sigmoid、Tanh和ReLU激活函数时解决一些回归问题。然而,尽管我的网络确实适用于基于分类的问题,但每当我使用回归网络时,我都会遇到一些问题

首先,如果我使用这个数据库:(我需要用这个数据测试我的网络),每当我尝试训练然后预测,同时使用ReLU作为激活函数时,所有输出都是0。我和我的教授谈过,他说也许我可以尝试使用Leaky ReLU,但如果我这样做,我会得到NaN值。我已经使用非常小的学习率进行训练,比如1*10^(-9),只是为了测试,在这种情况下,我不会得到NaN值,但无论如何,误差总是非常高。我的NN有一个隐藏层,我尝试了多个隐藏节点,看看是否有任何改进,但没有

下面是我如何定义我的激活函数(leaky ReLU):

下面是我的神经网络类定义:

class NeuralNetwork():
    
    def __init__(self, x, y, x_test, y_test):
        self.input      = x #input to NN
        self.weights1   = np.random.uniform(-0.5, 0.5, (inputSize,HiddenNodes)) #input to hidden weights
        self.weights2   = np.random.uniform(-0.5, 0.5, (HiddenNodes,outputSize))#hidden to output weights            
        self.y          = y   #real outputs of the training set
        self.output     = np.zeros(self.y.shape) #output of NN
        self.test_input = x_test
        self.outputTest = np.zeros(y_test.shape)
        
    def feedforward(self): #simple feedforward code
        self.layer1 = activation(np.dot(self.input, self.weights1))
        self.output = activation(np.dot(self.layer1, self.weights2))
        return self.output

    def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        slopeOut = activation_derivative(self.output)
        slopeIn = activation_derivative(self.layer1)


        ErrorOut = 2*(self.y - self.output)*slopeOut
        ErrorHiddenLayer = np.dot(ErrorOut, self.weights2.T)


        d_weights2 = np.dot(self.layer1.T, (ErrorOut)) + (self.lambd)*self.weights2
        d_weights1 = np.dot(self.input.T,  (ErrorHiddenLayer * slopeIn)) + (self.lambd)*self.weights1
        # update the weights with the derivative (slope) of the loss function
        
        sum1 = np.multiply(learning1, d_weights1)
        sum2 = np.multiply(learning2, d_weights2) 
        self.weights1 += sum1
        self.weights2 += sum2        
        

    def train(self, X, y): #function to train nn
        self.output = self.feedforward()
        self.backprop()
    
    def test(self, X2, Y2): #function to predict a given value
        self.layer1test = activation(np.dot(X2, self.weights1))
        self.outputTest = activation(np.dot(self.layer1test, self.weights2))
        return self.outputTest
我应该怎么做才能正确使用ReLU激活功能

class NeuralNetwork():
    
    def __init__(self, x, y, x_test, y_test):
        self.input      = x #input to NN
        self.weights1   = np.random.uniform(-0.5, 0.5, (inputSize,HiddenNodes)) #input to hidden weights
        self.weights2   = np.random.uniform(-0.5, 0.5, (HiddenNodes,outputSize))#hidden to output weights            
        self.y          = y   #real outputs of the training set
        self.output     = np.zeros(self.y.shape) #output of NN
        self.test_input = x_test
        self.outputTest = np.zeros(y_test.shape)
        
    def feedforward(self): #simple feedforward code
        self.layer1 = activation(np.dot(self.input, self.weights1))
        self.output = activation(np.dot(self.layer1, self.weights2))
        return self.output

    def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        slopeOut = activation_derivative(self.output)
        slopeIn = activation_derivative(self.layer1)


        ErrorOut = 2*(self.y - self.output)*slopeOut
        ErrorHiddenLayer = np.dot(ErrorOut, self.weights2.T)


        d_weights2 = np.dot(self.layer1.T, (ErrorOut)) + (self.lambd)*self.weights2
        d_weights1 = np.dot(self.input.T,  (ErrorHiddenLayer * slopeIn)) + (self.lambd)*self.weights1
        # update the weights with the derivative (slope) of the loss function
        
        sum1 = np.multiply(learning1, d_weights1)
        sum2 = np.multiply(learning2, d_weights2) 
        self.weights1 += sum1
        self.weights2 += sum2        
        

    def train(self, X, y): #function to train nn
        self.output = self.feedforward()
        self.backprop()
    
    def test(self, X2, Y2): #function to predict a given value
        self.layer1test = activation(np.dot(X2, self.weights1))
        self.outputTest = activation(np.dot(self.layer1test, self.weights2))
        return self.outputTest