Python 后验预测回归的估计_Python_Scipy_Scikit Learn_Statsmodels_Pymc

Python 后验预测回归的估计

python scikit-learn

Python 后验预测回归的估计,python,scipy,scikit-learn,statsmodels,pymc,Python,Scipy,Scikit Learn,Statsmodels,Pymc,假设我有一组随机的X，Y点： x = np.array(range(0,50)) y = np.random.uniform(low=0.0, high=40.0, size=200) y = map((lambda a: a[0] + a[1]), zip(x,y)) plt.scatter(x,y) 假设我使用线性回归将y建模为x的每个值的高斯分布，我如何估计x的每个（可能）值的p（y | x）使用pymc或scikit learn是否有一种直接的方法可以做到这一点？如果我正确理解您的

假设我有一组随机的X，Y点：

x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=200)
y = map((lambda a: a[0] + a[1]), zip(x,y))
plt.scatter(x,y)

假设我使用线性回归将
y
建模为
x
的每个值的高斯分布，我如何估计
x
的每个（可能）值的p（y | x）

使用

pymc

或

scikit learn

是否有一种直接的方法可以做到这一点？

如果我正确理解您的需求，您可以使用git版本的pymc（PyMC3）和glm子模块来实现这一点。比如说

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt 
from pymc import glm 

## Make some data
x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=50)
y = 2*x+y
## plt.scatter(x,y)

data = dict(x=x, y=y)
with pm.Model() as model:
    # specify glm and pass in data. The resulting linear model, its likelihood and 
    # and all its parameters are automatically added to our model.
    pm.glm.glm('y ~ x', data)
    step = pm.NUTS() # Instantiate MCMC sampling algorithm
    trace = pm.sample(2000, step)


##fig = pm.traceplot(trace, lines={'alpha': 1, 'beta': 2, 'sigma': .5});## traces
fig = plt.figure()
ax = fig.add_subplot(111)
plt.scatter(x, y, label='data')
glm.plot_posterior_predictive(trace, samples=50, eval=x,
                              label='posterior predictive regression lines')

得到这样的东西

你会发现这些博客文章很有趣：从那里我得到了这些想法

编辑要获得每个x的y值，请尝试我从glm源代码中获得的这个

lm = lambda x, sample: sample['Intercept'] + sample['x'] * x ## linear model
samples=50 ## Choose to be the same as in plot call
trace_det = np.empty([samples, len(x)]) ## initialise
for i, rand_loc in enumerate(np.random.randint(0, len(trace), samples)):
    rand_sample = trace[rand_loc]
    trace_det[i] = lm(x, rand_sample)
y = trace_det.T
y[0]

如果这不是最优雅的，请道歉-希望你能遵循逻辑。

你知道如何手工完成吗？谢谢！我特别感兴趣的是获得

y[0]，y[1]。。。y[50]

（即每个

y[i]

的样本向量）。你知道我如何得到它吗？说得清楚，对于x的每个值，你想要50个y值？是的-这是正确的（即，我正在寻找的应该是确定性的跟踪）。从我所看到的，已经实现了获取确定性的跟踪（见此处：）但我不知道如何使用它，例如，在这个例子中，我做了一个编辑-让我知道你是否在追求它。