Scikit learn sklearn：逻辑回归-预测概率（X）-计算_Scikit Learn_Logistic Regression

Scikit learn sklearn：逻辑回归-预测概率（X）-计算

scikit-learn

Scikit learn sklearn：逻辑回归-预测概率（X）-计算,scikit-learn,logistic-regression,Scikit Learn,Logistic Regression,我想知道是否有人可以快速看一下下面的代码片段，并指出我在计算模型中每个类的样本概率和相关代码错误时的误解。我试图手动计算sklearn函数lm.predict_proba（X）提供的结果，遗憾的是结果不同，所以我犯了一个错误我认为这个bug将出现在下面代码演练的“d”部分中。也许在数学方面，但我不明白为什么 a）创建和培训逻辑回归模型（效果良好） b）节约系数和偏差（工作良好） c）使用lm.predict_proba（X）（工作正常）结果是： estimate probabiliti

我想知道是否有人可以快速看一下下面的代码片段，并指出我在计算模型中每个类的样本概率和相关代码错误时的误解。我试图手动计算sklearn函数lm.predict_proba（X）提供的结果，遗憾的是结果不同，所以我犯了一个错误

我认为这个bug将出现在下面代码演练的“d”部分中。也许在数学方面，但我不明白为什么

a）创建和培训逻辑回归模型（效果良好）

b）节约系数和偏差（工作良好）

c）使用lm.predict_proba（X）（工作正常）

结果是：

estimate probabilities for each class: 
a 0.595426
b 0.019244
c 0.001343
d 0.004033
e 0.017185
f 0.004193
g 0.160380
h 0.158245
i 0.003093
j 0.036860
dtype: float64 
all probabilities by lm.predict_proba(..) sum up to 1.0

formula: [0.9667598370531315, 0.48453459121301334, 0.06154496922245115, 0.16456194859398865, 0.45634781280053394, 0.16999340794727547, 0.8867996361191054, 0.8854473986336552, 0.13124464656251109, 0.642913996162282]

softmax: [ 0.15329642 0.09464644 0.0620015 0.0687293 0.0920159 0.069103610.14151607 0.14132483 0.06647715 0.11088877]

d）手动执行lm.predict_proba完成的计算（无错误/警告，但结果不相同）

结果是：

estimate probabilities for each class: 
a 0.595426
b 0.019244
c 0.001343
d 0.004033
e 0.017185
f 0.004193
g 0.160380
h 0.158245
i 0.003093
j 0.036860
dtype: float64 
all probabilities by lm.predict_proba(..) sum up to 1.0

formula: [0.9667598370531315, 0.48453459121301334, 0.06154496922245115, 0.16456194859398865, 0.45634781280053394, 0.16999340794727547, 0.8867996361191054, 0.8854473986336552, 0.13124464656251109, 0.642913996162282]

softmax: [ 0.15329642 0.09464644 0.0620015 0.0687293 0.0920159 0.069103610.14151607 0.14132483 0.06647715 0.11088877]

由于在上的评论，使用了Softmax

注:

我发现了一个非常类似的问题，但遗憾的是，我无法将其改编为我的代码，因此预测结果是一样的。我尝试了许多不同的组合来计算变量“z_代表_class_k”和“p_代表_class_k”，但遗憾的是，没有成功地从“predict_proba（X）”中重现预测值。

我认为问题在于

p_表示类_k=1/（1+math.exp（-z_表示类_k））

1/（1+exp（-logit））

是一种仅适用于二进制问题的简化方法

在简化之前，实际方程如下所示：

p_for_classA=
exp（物流课程A）/
[1+exp（logit\u classA）+exp（logit\u classB）.+exp（logit\u classC）]

换句话说，在计算特定类别的概率时，必须将其他类别的所有权重和偏差也纳入公式中

我没有数据来测试这一点，但希望这能为您指明正确的方向。

改变

p_for_class_k = 1/ (1 + math.exp(-z_for_class_k))
manual_calculated_probabilities.append(p_for_class_k)

到

在您的符号中，softmax的输入是“z”s而不是“p”

我能够通过以下操作复制方法

lr.predict_proba

：

>>> sigmoid = lambda x: 1/(1+np.exp(-x))
>>> sigmoid(lr.intercept_+np.sum(lr.coef_*X.values, axis=1))

假设X是一个numpy数组，

lr

是sklearn中的一个对象。

你找到解决方案@Ted Frank了吗？

For a multi_class problem, if multi_class is set to be "multinomial" the softmax function is used to find the predicted probability of each class.

print "shape of X: " , X_select_image_data.shape
print "shape of W: " , W.shape
print "shape of b: " , b.shape

shape of X:  (1, 784)
shape of W:  (10, 784)
shape of b:  (10,)

p_for_class_k = 1/ (1 + math.exp(-z_for_class_k))
manual_calculated_probabilities.append(p_for_class_k)

manual_calculated_probabilities.append(z_for_class_k)

>>> sigmoid = lambda x: 1/(1+np.exp(-x))
>>> sigmoid(lr.intercept_+np.sum(lr.coef_*X.values, axis=1))