Python 3.x SK2高斯学习中的GMM拟合较差_Python 3.x_Scikit Learn_Gaussian_Gmm

Python 3.x SK2高斯学习中的GMM拟合较差

python-3.x scikit-learn

Python 3.x SK2高斯学习中的GMM拟合较差,python-3.x,scikit-learn,gaussian,gmm,Python 3.x,Scikit Learn,Gaussian,Gmm,我想用sklearn拟合一个2组分混合模型，然后计算后验概率。但就我目前掌握的代码而言，两个发行版中的一个发行版的拟合度是完美的（过拟合？），而另一个发行版的拟合度非常差。我做了一个模拟例子，采样2高斯 import numpy as np from sklearn.mixture import GaussianMixture import matplotlib.pyplot as plt def calc_pdf(): """ calculate gauss mixture

我想用sklearn拟合一个2组分混合模型，然后计算后验概率。但就我目前掌握的代码而言，两个发行版中的一个发行版的拟合度是完美的（过拟合？），而另一个发行版的拟合度非常差。我做了一个模拟例子，采样2高斯

import numpy as np
from sklearn.mixture import GaussianMixture
import matplotlib.pyplot as plt 

def calc_pdf():
    """
    calculate gauss mixture modelling for 2 comp
    return pdfs 
    """
    d = np.random.normal(-0.1, 0.07, 5000)
    t = np.random.normal(0.2, 0.13, 10000)
    pool = np.concatenate([d, t]).reshape(-1,1)
    label = ['d']*d.shape[0] + ['t'] * t.shape[0]
    X = pool[pool>0].reshape(-1,1)
    X = np.log(X)
    clf = GaussianMixture(
                        n_components=2,
                        covariance_type='full',
                        tol = 1e-24,
                        max_iter = 1000
                        )
    logprob = clf.fit(X).score_samples(X)
    responsibilities = clf.predict_proba(X)
    pdf = np.exp(logprob)
    pdf_individual = responsibilities * pdf[:, np.newaxis]
    plot_gauss(np.log(d), np.log(t), pdf_individual, X)
    return pdf_individual[0], pdf_individual[1]

def plot_gauss(d, t, pdf_individual, x):
    fig, ax = plt.subplots(figsize=(12, 9), facecolor='white')
    ax.hist(d, 30, density=True, histtype='stepfilled', alpha=0.4)
    ax.hist(t, 30, density=True, histtype='stepfilled', alpha=0.4)
    ax.plot(x, pdf_individual, '.')
    ax.set_xlabel('$x$')
    ax.set_ylabel('$p(x)$')
    plt.show()
calc_pdf()

这就产生了这幅图

有什么明显的地方我遗漏了吗？

你不把你的左半身剪掉一大块吗

len（pool）-len（pool[pool>0]）

~5200如果你将平均值改为更大的正数，看起来很好，你不是切断了左高斯分布的更大部分吗

len（pool）-len（pool[pool>0]）

~5200如果您将平均值更改为更大的正数，看起来很好