python-GLM泊松回归概率_Python_Machine Learning_Statsmodels

python-GLM泊松回归概率

python machine-learning

python-GLM泊松回归概率,python,machine-learning,statsmodels,Python,Machine Learning,Statsmodels,我正在使用Statsmodel GLM模型执行泊松回归。我的数据集如下所示： Quantity Month cannibal_numbers category_performance 0 0.0 11 0 7 1 3985.0 1 1 2 2 7690.0 2

我正在使用Statsmodel GLM模型执行泊松回归。我的数据集如下所示：

    Quantity  Month  cannibal_numbers  category_performance
0        0.0     11                 0                     7
1     3985.0      1                 1                     2
2     7690.0      2                 5                     4
3    10070.0      4                 3                    10

数量

是预测变量，其他三列是预测变量

数量是预测变量，其他三列是预测变量

根据Statsmodels文档，我以以下方式构建了泊松回归模型：


expr = """Quantity ~ Month  + cannibal_numbers + category_performance"""
        y, X = dmatrices(expr, series, return_type='dataframe')

poisson_fit = sm.GLM(y, X, family=sm.families.Poisson()).fit()

poisson_predict = poisson_fit.predict()

我被困在这里了。我想得到的是

数量

为1、2、3等的概率。。直到。我不知道如何做到这一点

如何在statsmodels中实现这一点？提前感谢您的指导更新：多亏了约瑟夫，事情变得更清楚了，按照他的建议调整了我的模型：

poisson_fit = sm.GLM(y, X, family=sm.families.Poisson()).fit()
    series['poisson_predict'] = poisson_fit.predict()


    counts = np.arange(4)
    predict_prob = stats.poisson.pmf(counts, np.asarray(series['poisson_predict'])[:, None])
    results = pd.DataFrame(predict_prob)

对于数据集的每一行，返回数量为1到4的发生概率。详情如下:

              0             1             2             3             4   \
0   9.267928e-08  1.500859e-06  1.215255e-05  6.559995e-05  2.655834e-04   
1   9.267928e-08  1.500859e-06  1.215255e-05  6.559995e-05  2.655834e-04   
2   9.267928e-08  1.500859e-06  1.215255e-05  6.559995e-05  2.655834e-04   
3   2.286170e-07  3.495832e-06  2.672777e-05  1.362334e-04  5.207935e-04 
...

拟合模型是否会给出该数据（包括mu）的方程线，以便通过考虑该方程，预测数量从1到4的概率，因此，每个需求量只有一个概率？

statsmodels.discrete中的泊松模型在results实例中使用了

predict\u prob

方法来计算该概率

对于泊松分布，我们可以直接使用scipy.stats分布，参数化是相同的

例如，使用numpy广播获取0的概率。。。行中所有预测案例的列中为4

from scipy import stats
poisson_predict = poisson_fit.predict()
counts = np.arange(5)
predict_prob = stats.poisson.pmf(counts, np.asarray(poisson_predict)[:, None])

在其他一些GLM和计数分布（如负二项式）中，回归模型的参数化与scipy中的参数化不同。我们需要转换参数，使其与scipy.stats.distributions参数化一致

一些较新的计数模型，如GeneratedPoisson和零膨胀版本，在预测中有一个“which”选项，可以直接返回预测概率

e、 g.对于零膨胀模型

which str, optional
    Define values that will be predicted. 
    ‘mean’, ‘mean-main’, ‘linear’, ‘mean-nonzero’, 
    ‘prob-zero, ‘prob’, ‘prob-main’ Default is ‘mean’.

非常感谢你的回答，这是无价的帮助！我仍然很困惑，为什么在分布生成中使用

poisson\u predict

的结果作为

mu

？poisson分布的参数是

mu

，它等于分布的平均值。predict方法返回平均值的预测值，因此对应于泊松参数mu（有时称为lambda）。感谢Josef，我有很多东西要学习，那么为什么要传递

poisson predict

作为示例中的mu，我觉得历史数据集的每个值都像一个mu？