Python 内置训练功能比定制训练功能工作得更好

Python 内置训练功能比定制训练功能工作得更好,python,numpy,linear-regression,gradient-descent,Python,Numpy,Linear Regression,Gradient Descent,我正在尝试训练用于房价预测的线性回归模型。我试过三种方法 def hypothesis(self, input_data): # input data: (1382, 22), features: (22,), prediction: (1382,) prediction = np.matmul(input_data,self.features) return prediction def cost(self, predi

我正在尝试训练用于房价预测的线性回归模型。我试过三种方法

    def hypothesis(self, input_data):
        # input data: (1382, 22), features: (22,), prediction: (1382,)
        prediction = np.matmul(input_data,self.features)
        return prediction
    
    def cost(self, predicted, actual):
        cost = np.sum(np.square(predicted- actual)) / len(predicted) / 2
        # predicted: (1382,), actual: (1382,), cost: float number
        return cost
    
    def gradient_descent(self, alpha, prediction, actual):
        self.features -= alpha * (np.matmul(self.train_x.T,(prediction - actual))) / len(prediction)
        # prediction: (1382,), actual: (1382,), alpha: 0.001, features: (22,)
        
    def normal_equation(self, X, y):
        self.features = np.matmul(np.matmul(np.linalg.inv(np.matmul(X.T, X)), X.T), y)
        #X.shape: (1382, 22), y.shape: (1382,), features: (22,)
        
    def built_in_train(self):
        reg = LinearRegression().fit(self.train_x, self.train_y)
        #train_x: (1382, 22), train_y: (1382,)
        return reg
        
    def train(self, alpha, epoch, normal_equation=False, builtin_function=False):
        if(normal_equation):
            self.normal_equation(self.train_x, self.train_y)
            return None
        elif (builtin_function):
            return self.built_in_train()
        else: 
            for i in range(0, epoch):
                prediction = self.hypothesis(self.train_x)
                cost = self.cost(prediction, self.train_y)
                self.train_cost_history.append(cost)
                self.gradient_descent(alpha, prediction, self.train_y)
            return None
  • 自定义梯度下降算法
  • 自定义正态方程算法
  • sklearn提供的内置函数
  • 我通过r平方值评估了所有三个模型,如下所示:

  • 自定义梯度算法的R平方:-4.01
  • 自定义法线_方程算法的R平方:-95.38
  • 内置函数的R平方算法:0.54
  • 现在我不明白为什么两个自定义算法的r平方值为负值。代码如下所示。这三种方法都提供了相同的数据

        def hypothesis(self, input_data):
            # input data: (1382, 22), features: (22,), prediction: (1382,)
            prediction = np.matmul(input_data,self.features)
            return prediction
        
        def cost(self, predicted, actual):
            cost = np.sum(np.square(predicted- actual)) / len(predicted) / 2
            # predicted: (1382,), actual: (1382,), cost: float number
            return cost
        
        def gradient_descent(self, alpha, prediction, actual):
            self.features -= alpha * (np.matmul(self.train_x.T,(prediction - actual))) / len(prediction)
            # prediction: (1382,), actual: (1382,), alpha: 0.001, features: (22,)
            
        def normal_equation(self, X, y):
            self.features = np.matmul(np.matmul(np.linalg.inv(np.matmul(X.T, X)), X.T), y)
            #X.shape: (1382, 22), y.shape: (1382,), features: (22,)
            
        def built_in_train(self):
            reg = LinearRegression().fit(self.train_x, self.train_y)
            #train_x: (1382, 22), train_y: (1382,)
            return reg
            
        def train(self, alpha, epoch, normal_equation=False, builtin_function=False):
            if(normal_equation):
                self.normal_equation(self.train_x, self.train_y)
                return None
            elif (builtin_function):
                return self.built_in_train()
            else: 
                for i in range(0, epoch):
                    prediction = self.hypothesis(self.train_x)
                    cost = self.cost(prediction, self.train_y)
                    self.train_cost_history.append(cost)
                    self.gradient_descent(alpha, prediction, self.train_y)
                return None