Python Logistic回归仅预测1类_Python_Machine Learning_Logistic Regression

Python Logistic回归仅预测1类

python machine-learning

Python Logistic回归仅预测1类,python,machine-learning,logistic-regression,Python,Machine Learning,Logistic Regression,我是数据科学或机器学习的新手。我尝试从实现代码，但预测只返回1个类。这是我的密码： classification_data = data.drop([10], axis=1).values classification_label = data[10].values class LogisticRegression: def __init__(self, lr=0.01, num_iter=100000): self.lr = lr self.num_

我是数据科学或机器学习的新手。我尝试从实现代码，但预测只返回1个类。这是我的密码：

classification_data = data.drop([10], axis=1).values
classification_label = data[10].values

class LogisticRegression:
    def __init__(self, lr=0.01, num_iter=100000):
        self.lr = lr
        self.num_iter = num_iter
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        '''Build a logistic regression classifier from the training set (X, y)'''

        n_samples, n_features = X.shape

        # init parameters
        self.weights = np.zeros(n_features)
        self.bias = 0

        # gradient descent
        for _ in range(self.num_iter):
            # approximate y with linear combination of weights and x, plus bias
            linear_model = np.dot(X, self.weights) + self.bias
            # apply sigmoid function
            y_predicted = self._sigmoid(linear_model)

            # compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1 / n_samples) * np.sum(y_predicted - y)
            # update parameters
            self.weights -= self.lr * dw
            self.bias -= self.lr * db
        #raise NotImplementedError()

    def predict_proba(self, X):
        return self._sigmoid(X)
        raise NotImplementedError()

    def predict(self, X, threshold=0.5): # default threshold adalah 0.5
        '''Predict class value for X'''
        '''hint: you can use predict_proba function to classify based on given threshold'''
        linear_model = np.dot(X, self.weights) + self.bias
        #print (linear_model)
        y_predicted = self._sigmoid(linear_model)
        #print (self.predict_proba(linear_model))
        y_predicted_cls = [2 if i > threshold else 1 for i in y_predicted]

        return np.array(y_predicted_cls)
        raise NotImplementedError()

    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

当我尝试调用predict时，它只返回一个类：

model.predict(classification_data, threshold=0.5)

结果:

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, etc])

array([[0.58826319, 0.5       , 0.52721189, ..., 0.60211507, 0.64565631,
        0.62245933],
       [0.58586893, 0.73105858, 0.52944351, ..., 0.57793101, 0.62245933,
        0.61387647],
       [0.63513751, 0.73105858, 0.57590132, ..., 0.6357912 , 0.55971365,
        0.52497919]. etc ]])

此时尝试调用predict_proba：

model.predict_proba(classification_data)

结果:

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, etc])

array([[0.58826319, 0.5       , 0.52721189, ..., 0.60211507, 0.64565631,
        0.62245933],
       [0.58586893, 0.73105858, 0.52944351, ..., 0.57793101, 0.62245933,
        0.61387647],
       [0.63513751, 0.73105858, 0.57590132, ..., 0.6357912 , 0.55971365,
        0.52497919]. etc ]])

非常感谢您的帮助。

您的算法在分类方面工作正常，但您没有正确实施

predict\u proba

按照您现在使用它的方式，

self.\u sigmoid

分别应用于每个预测值。您希望将其应用于线性模型的结果，与在

predict

函数中应用它的方式相同

从为

predict\u proba

提供的输出中可以看到，结果是一个二维张量，而不是预期的一维数组。该功能的正确实现是

def predict_proba（self，X）：
线性_模型=np.点（X，自权重）+自偏倚
返回自。_sigmoid（线性模型）

我在iris数据集上运行了这个算法，只是为了看看它是否有效，以及它是否能正确地对所有内容进行分类。你可以自己测试

从sklearn.dataset导入加载
从sklearn.metrics导入混淆矩阵
iris=加载_iris（）
X=iris.data
y=iris.target
y[y==2]=1#将问题转化为二元分类
log_reg=LogisticRegression（）
对数调整配合（X，y）
yproba=对数注册预测概率（X）
ypred=对数校正预测（X）
cm=混淆矩阵（y，ypred）

本例中的混淆矩阵为

50  |  0
----------
0   |  100

在上面的示例中，模型在完整的数据集上进行训练，但即使对于训练/测试分割，也会获得相同的结果（所有内容都正确分类）

从sklearn.model\u选择导入列车\u测试\u分割
X_系列，X_测试，y_系列，y_测试=系列测试分割（X，y，测试尺寸=0.2）
log_reg=LogisticRegression（）
日志注册匹配（X\U系列、y\U系列）
cm=混淆矩阵（y检验，ypred）

在这种情况下，混淆矩阵为

8 | 0
----------
0   |  22

结论是您的算法工作正常。奇怪的行为（如果有的话）应该归因于您输入到算法中的数据。（您确定它不应该为您案例中的所有测试观察预测同一类吗？）

请注意，我在您的代码中又更改了一行

#从返回1和2的原始位置
y_predicted_cls=[1如果i>阈值，则y_predicted中的i为0]

为了简单起见，我想您可以称之为最佳实践。

毕竟这是因为我使用了sigmoid，它返回的值介于0和1之间，所以我将数据集上的y值更改为0和1。现在它工作得很好。但是精度仍然不太好。

您的训练数据中的y是什么样子的？对于标签而言，0或1比1或2更常见。

predict

和

predict\u proba

的结果不一致，因为

\u sigmoid

的参数不同（

\u sigmoid（线性）模型）

中的

和predict\u proba中的\u sigmoid（X）
）.数据来自预测函数，有一行：y_predicted_cls=[2 if i>threshold else 1 for i in y_predicted]，我认为当我更改阈值时，预测类将不同，但它不是