Python 如何在multivariable/3D中实现核密度估计
我的数据集如下fromat和im试图找出具有最佳带宽的内核密度估计Python 如何在multivariable/3D中实现核密度估计,python,numpy,machine-learning,scikit-learn,kernel-density,Python,Numpy,Machine Learning,Scikit Learn,Kernel Density,我的数据集如下fromat和im试图找出具有最佳带宽的内核密度估计 data = np.array([[1, 4, 3], [2, .6, 1.2], [2, 1, 1.2], [2, 0.5, 1.4], [5, .5, 0], [0, 0, 0], [1, 4, 3], [5, .5, 0], [2, .5, 1.2]]) 但我不知道该怎么做。还有如何找到矩阵∑ 更新 我尝试了scikit学习工具包中的KDE函数,以找出单变量(1D)KDE 有谁能帮我把
data = np.array([[1, 4, 3], [2, .6, 1.2], [2, 1, 1.2],
[2, 0.5, 1.4], [5, .5, 0], [0, 0, 0],
[1, 4, 3], [5, .5, 0], [2, .5, 1.2]])
但我不知道该怎么做。还有如何找到矩阵∑
更新
我尝试了scikit学习工具包中的KDE函数,以找出单变量(1D)KDE
有谁能帮我把这个问题扩展到多变量/在这种情况下是三维数据吗?有趣的问题。您有几个选择:
我将向你展示什么是最简单的方法(在我看来——是的,这有点基于观点),我认为在你的案例中是选项2 注意此方法使用链接文档中所述的经验法则来确定带宽。使用的确切规则是斯科特规则。您提到∑矩阵使我认为经验法则带宽选择适合您,但您也谈到了最佳带宽,并且您给出的示例使用交叉验证来确定最佳带宽。因此,如果这种方法不适合您的目的-请在评论中告诉我
import numpy as np
from scipy import stats
data = np.array([[1, 4, 3], [2, .6, 1.2], [2, 1, 1.2],
[2, 0.5, 1.4], [5, .5, 0], [0, 0, 0],
[1, 4, 3], [5, .5, 0], [2, .5, 1.2]])
data = data.T #The KDE takes N vectors of length K for K data points
#rather than K vectors of length N
kde = stats.gaussian_kde(data)
# You now have your kde!! Interpreting it / visualising it can be difficult with 3D data
# You might like to try 2D data first - then you can plot the resulting estimated pdf
# as the height in the third dimension, making visualisation easier.
# Here is the basic way to evaluate the estimated pdf on a regular n-dimensional mesh
# Create a regular N-dimensional grid with (arbitrary) 20 points in each dimension
minima = data.T.min(axis=0)
maxima = data.T.max(axis=0)
space = [np.linspace(mini,maxi,20) for mini, maxi in zip(minima,maxima)]
grid = np.meshgrid(*space)
#Turn the grid into N-dimensional coordinates for each point
#Note - coords will get very large as N increases...
coords = np.vstack(map(np.ravel, grid))
#Evaluate the KD estimated pdf at each coordinate
density = kde(coords)
#Do what you like with the density values here..
#plot them, output them, use them elsewhere...
警告 这可能会产生可怕的结果,这取决于你的具体问题。要记住的事情显然是:
我想知道我是否能帮上忙,但我需要多了解一点。我可以看到每个数据点都有三个值,但按照您编写的方式,这些三元组进一步分组为三个组。输入数据分组两次是否有原因?还要再次检查一下∑矩阵的含义。我假设你指的是估计的数据协方差-所以你可以使用∑^(-1/2)的经验法则带宽?如果是的话,你是打算在这里开始带宽优化,还是代替优化?我的回答有帮助吗?如果没有-请随意添加一些注释,因为我可能会根据您的需要对其进行调整。@JRichardSnape您是对的,我以错误的方式对数据进行分组,实际上在我的代码中,它就像您实现的一样,但当复制代码时,我弄糟了。是的,∑是指协方差矩阵。但我仍然不确定我下面的答案是否有帮助——这能满足你的需求吗?或者你的问题还有别的吗?如果你想输出协方差矩阵,我可以加上。非常感谢你的帮助。这很有帮助。我还有两个问题,希望你能帮我解决。(1) 我如何使用我自己的带宽,就像我们在sklearn.kde(gridcrossover)(2)中所做的那样,正如你所说,先在2d中绘制,然后在3d中绘制高度,你能告诉我如何在这里做我尝试过的吗?我会在有时间的时候看看这些问题。对不起,我没有机会看这个。1) 您可以设置自己的带宽:我没有使用过这个,示例似乎适用于1D情况。在可视化方面——我建议从2D输入数据开始,而不是3D。你放在那个链接上的代码没有正确的导入等等,所以根本不会运行。嗨,你能帮我从你的代码中显示带宽矩阵吗。。我试过kde.factor,但它给了我一个浮点数。但对于多变量情形(3d),它不应该显示3x3带宽矩阵吗。谢谢你还需要看看kde.协方差
kde.factor
乘以kde.convariance
得到我认为您希望看到的内核协方差矩阵或带宽(我认为您在上面称之为∑)。这在本书的底部有详细说明
import numpy as np
from scipy import stats
data = np.array([[1, 4, 3], [2, .6, 1.2], [2, 1, 1.2],
[2, 0.5, 1.4], [5, .5, 0], [0, 0, 0],
[1, 4, 3], [5, .5, 0], [2, .5, 1.2]])
data = data.T #The KDE takes N vectors of length K for K data points
#rather than K vectors of length N
kde = stats.gaussian_kde(data)
# You now have your kde!! Interpreting it / visualising it can be difficult with 3D data
# You might like to try 2D data first - then you can plot the resulting estimated pdf
# as the height in the third dimension, making visualisation easier.
# Here is the basic way to evaluate the estimated pdf on a regular n-dimensional mesh
# Create a regular N-dimensional grid with (arbitrary) 20 points in each dimension
minima = data.T.min(axis=0)
maxima = data.T.max(axis=0)
space = [np.linspace(mini,maxi,20) for mini, maxi in zip(minima,maxima)]
grid = np.meshgrid(*space)
#Turn the grid into N-dimensional coordinates for each point
#Note - coords will get very large as N increases...
coords = np.vstack(map(np.ravel, grid))
#Evaluate the KD estimated pdf at each coordinate
density = kde(coords)
#Do what you like with the density values here..
#plot them, output them, use them elsewhere...