Python 如何在Logistic回归中找到Logistic/Sigmoidal函数参数_Python_Machine Learning_Scikit Learn_Logistic Regression

Python 如何在Logistic回归中找到Logistic/Sigmoidal函数参数

python machine-learning scikit-learn

Python 如何在Logistic回归中找到Logistic/Sigmoidal函数参数,python,machine-learning,scikit-learn,logistic-regression,Python,Machine Learning,Scikit Learn,Logistic Regression,我想估计医学数据逻辑回归中使用的sigmoidal/logistic的最佳参数（最后提到：斜率和截距）。以下是我在python中所做的工作： import numpy as np from sklearn import preprocessing, svm, neighbors from sklearn.linear_model import LinearRegression, LogisticRegression from sklearn.model_selection import trai

我想估计医学数据逻辑回归中使用的sigmoidal/logistic的最佳参数（最后提到：斜率和截距）。以下是我在python中所做的工作：

import numpy as np
from sklearn import preprocessing, svm, neighbors
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import preprocessing, svm, utils
from scipy.io import loadmat
import pandas as pd

我有一个Apache.mat文件，它包含4列：Apache评分（0-72）、患者数量、死亡人数、比例（死亡人数与患者数量的比率）

在这里，我创建了要使用的数据框架

x = np.array(data.drop(['NoPatients', 'NoDeaths', 'proportion'],1))

我已经放弃了那些不受欢迎的专栏，现在只剩下《x》中的ApacheII分数了

#scaling the data (normalizing)
x = preprocessing.scale(x)

y = np.array(data['proportion'])

现在，我使用LabelEncoder（）函数对“y”进行编码，这样它就可以与LogisticRegression（）兼容

结果如下：

[[-0.49124107]
[-0.23528893]
[-0.19035795]
[-0.30312848]
[-0.25783808]
 [-0.37161079]
 [-0.12332468]
 [-0.16797195]
 [-0.05660718]
 [-0.21279785]
 [-0.22142453]
 [-0.10105617]
 [-0.14562868]
 [ 0.00991192]
 [-0.012247  ]
 [ 0.03206243]
 [ 0.07635461]
 [ 0.20951544]
 [ 0.12067417]
 [-0.03441851]
 [ 0.16504852]
 [ 0.09850035]
 [ 0.23179558]
 [ 0.05420914]
 [ 1.47513463]]
[-1.79691975 -2.35677113 -2.35090141 -2.3679202  -2.36017388 -2.38191049
 -2.34441678 -2.34843121 -2.34070389 -2.35368047 -1.57944984 -2.3428732
 -2.3462668  -2.33974088 -2.33975687 -2.34002906 -2.34151792 -2.35329447
 -2.34422478 -2.34007746 -2.34814388 -2.34271603 -2.35632459 -2.34062229
 -1.72511457]

我只想找出逻辑回归中常用的S形函数的参数。如何找到S形参数（即截距和斜率）

这里是S形函数（如果需要参考）：

这是解决多项式问题的逻辑回归的正常行为。看：

在多类情况下，训练算法使用一对一的方法（OvR）计划

当问题是二元问题时，intercept_uu的形状为（1，）

示例：

>>> clf = LogisticRegression()
>>> clf.fit([[1,2], [1,3], [0, 1]], [[0],[1],[0]])
>>> clf.coef_
array([[ 0.02917282,  0.12584457]])
>>> clf.intercept_
array([-0.40218649])
>>> clf.fit([[1,2], [1,3], [0, 1]], [[0],[1],[2]])
>>> clf.coef_
array([[ 0.25096507, -0.24586515],
       [ 0.02917282,  0.12584457],
       [-0.41626058, -0.43503612]])
>>> clf.intercept_
array([-0.15108918, -0.40218649,  0.1536541 ])

事实上，有一些模型旨在解决不同的二进制问题。你们可以合并第i个coef和第i个intercept，你们将得到解决第i个二进制问题的模型，等等，到列表的末尾。

若比例是连续变量，我认为对于这个问题你们应该寻找岭回归而不是逻辑回归。是的，你们是正确的@GergesDib。谢谢但在这里，我只是想找出逻辑函数的参数，尽管它不是最好的回归模型。非常感谢您的帮助。我想您已经找到了它们，它们是

lr.coef\uz

和

lr.intercept\uz

。有什么问题吗？我预测

lr.coef\uu

和

lr.intercept\uu

应该有一个值，这就是我需要的。但我得到了很多价值观。你能帮忙吗？什么是

x.shape

和

y.shape

？

[[-0.49124107]
[-0.23528893]
[-0.19035795]
[-0.30312848]
[-0.25783808]
 [-0.37161079]
 [-0.12332468]
 [-0.16797195]
 [-0.05660718]
 [-0.21279785]
 [-0.22142453]
 [-0.10105617]
 [-0.14562868]
 [ 0.00991192]
 [-0.012247  ]
 [ 0.03206243]
 [ 0.07635461]
 [ 0.20951544]
 [ 0.12067417]
 [-0.03441851]
 [ 0.16504852]
 [ 0.09850035]
 [ 0.23179558]
 [ 0.05420914]
 [ 1.47513463]]
[-1.79691975 -2.35677113 -2.35090141 -2.3679202  -2.36017388 -2.38191049
 -2.34441678 -2.34843121 -2.34070389 -2.35368047 -1.57944984 -2.3428732
 -2.3462668  -2.33974088 -2.33975687 -2.34002906 -2.34151792 -2.35329447
 -2.34422478 -2.34007746 -2.34814388 -2.34271603 -2.35632459 -2.34062229
 -1.72511457]

def sigmoid(x, x0, k):
     y = 1 / (1 + np.exp(-k*(x-x0)))
     return y

>>> clf = LogisticRegression()
>>> clf.fit([[1,2], [1,3], [0, 1]], [[0],[1],[0]])
>>> clf.coef_
array([[ 0.02917282,  0.12584457]])
>>> clf.intercept_
array([-0.40218649])
>>> clf.fit([[1,2], [1,3], [0, 1]], [[0],[1],[2]])
>>> clf.coef_
array([[ 0.25096507, -0.24586515],
       [ 0.02917282,  0.12584457],
       [-0.41626058, -0.43503612]])
>>> clf.intercept_
array([-0.15108918, -0.40218649,  0.1536541 ])