Python 如何对GMM发行版的不同组件进行采样？_Python_Scikit Learn_Gmm

Python 如何对GMM发行版的不同组件进行采样？

python scikit-learn

Python 如何对GMM发行版的不同组件进行采样？,python,scikit-learn,gmm,Python,Scikit Learn,Gmm,我使用sklearn高斯混合模型算法（GMM）对我的数据进行了聚类。我有3个集群。我的数据中的每一点都代表了一个分子结构。我想知道如何对每个簇进行采样。我已尝试使用以下功能： gmm = GMM(n_components=3).fit(Data) gmm.sample(n_samples=20) 但它确实对整个分布进行了采样，但我需要对每个分量进行采样。这并不容易，因为需要计算所有协方差矩阵的特征向量。下面是我研究的一个问题的一些示例代码 import numpy as np from sci

我使用sklearn高斯混合模型算法（GMM）对我的数据进行了聚类。我有3个集群。我的数据中的每一点都代表了一个分子结构。我想知道如何对每个簇进行采样。我已尝试使用以下功能：

gmm = GMM(n_components=3).fit(Data)
gmm.sample(n_samples=20)

但它确实对整个分布进行了采样，但我需要对每个分量进行采样。

这并不容易，因为需要计算所有协方差矩阵的特征向量。下面是我研究的一个问题的一些示例代码

import numpy as np
from scipy.stats import multivariate_normal
import random
from operator import truediv
import itertools
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn import mixture

#import some data which can be used for gmm
mix = np.loadtxt("mixture.txt", usecols=(0,1), unpack=True)
#print(mix.shape)
color_iter = itertools.cycle(['navy', 'c', 'cornflowerblue', 'gold',
                              'darkorange'])

def plot_results(X, Y_, means, covariances, index, title):
#function for plotting the gaussians
    splot = plt.subplot(2, 1, 1 + index)
    for i, (mean, covar, color) in enumerate(zip(
            means, covariances, color_iter)):
        v, w = linalg.eigh(covar)
        v = 2. * np.sqrt(2.) * np.sqrt(v)
        u = w[0] / linalg.norm(w[0])
        # as the DP will not use every component it has access to
        # unless it needs it, we shouldn't plot the redundant
        # components.
        if not np.any(Y_ == i):
            continue
        plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], .8, color=color)

        # Plot an ellipse to show the Gaussian component
        angle = np.arctan(u[1] / u[0])
        angle = 180. * angle / np.pi  # convert to degrees
        ell = mpl.patches.Ellipse(mean, v[0], v[1], 180. + angle, color=color)
        ell.set_clip_box(splot.bbox)
        ell.set_alpha(0.5)
        splot.add_artist(ell)

    plt.xlim(-4., 3.)
    plt.ylim(-4., 2.)

gmm = mixture.GaussianMixture(n_components=3, covariance_type='full').fit(mix.T)
print(gmm.predict(mix.T))
plot_results(mix.T, gmm.predict(mix.T), gmm.means_, gmm.covariances_, 0,
             'Gaussian Mixture')

因此，对于我的问题，结果图如下所示：

编辑：这里是你评论的答案。我会用熊猫来做这个。假设

是您的特征矩阵，

是您的标签，然后

import pandas as pd
y_pred = gmm.predict(X)
df_all_info = pd.concat([X,y,y_pred], axis=1)

在生成的数据框中，您可以检查所需的所有信息，甚至可以排除算法错误分类的样本：

df_wrong = df_all_info[df_all_info['name of y-column'] != df_all_info['name of y_pred column']]

我不是这个意思。。我有一张和你一样的图表。我的数据已经分布在三个簇中，我可以以图形方式看到簇，在我的例子中，我使用了球体，因为它是一个3D绘图。我想知道数据中的哪个点对应于每个集群。例如，如果每一行都是分子结构，我想知道第7行所代表的结构是否属于簇1、簇2或簇3，依此类推。或者我可以链接到X结构的每个集群的样本。我不确定我的解释是否正确检查我的编辑，这应该是你最初想要的答案这真的很有帮助！谢谢！