用python拟合直方图_Python_Histogram_Curve Fitting

用python拟合直方图

python

用python拟合直方图,python,histogram,curve-fitting,Python,Histogram,Curve Fitting,我有柱状图 H=hist(my_data,bins=my_bin,histtype='step',color='r') 我可以看到形状几乎是高斯的，但我想用高斯函数拟合这个直方图，并打印我得到的平均值和sigma值。您能帮助我吗？这里有一个处理py2.6和py3.2的示例： from scipy.stats import norm import matplotlib.mlab as mlab import matplotlib.pyplot as plt # read data from a

我有柱状图

H=hist(my_data,bins=my_bin,histtype='step',color='r')

我可以看到形状几乎是高斯的，但我想用高斯函数拟合这个直方图，并打印我得到的平均值和sigma值。您能帮助我吗？

这里有一个处理py2.6和py3.2的示例：

from scipy.stats import norm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# read data from a text file. One number per line
arch = "test/Log(2)_ACRatio.txt"
datos = []
for item in open(arch,'r'):
    item = item.strip()
    if item != '':
        try:
            datos.append(float(item))
        except ValueError:
            pass

# best fit of data
(mu, sigma) = norm.fit(datos)

# the histogram of the data
n, bins, patches = plt.hist(datos, 60, normed=1, facecolor='green', alpha=0.75)

# add a 'best fit' line
y = mlab.normpdf( bins, mu, sigma)
l = plt.plot(bins, y, 'r--', linewidth=2)

#plot
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'$\mathrm{Histogram\ of\ IQ:}\ \mu=%.3f,\ \sigma=%.3f$' %(mu, sigma))
plt.grid(True)

plt.show()

这里有一个例子，它使用scipy.optimize拟合高斯函数等非线性函数，即使数据位于范围不广的直方图中，因此简单的平均值估计也会失败。偏移量常量也会导致简单的正态统计失败（对于纯高斯数据，只需删除p[3]和c[3]）

输出：

A exp[-0.5((x-mu)/sigma)^2] + k 
Parent Coefficients:
1.000, 0.200, 0.300, 0.625
Fit Coefficients:
0.961231625289 0.197254597618 0.293989275502 0.65370344131

这里是另一个仅使用

matplotlib.pyplot

和

numpy

包的解决方案。它只适用于高斯拟合。它是基于并且已经在本文中提到过的。以下是相应的代码：

# Python version : 2.7.9
from __future__ import division
import numpy as np
from matplotlib import pyplot as plt

# For the explanation, I simulate the data :
N=1000
data = np.random.randn(N)
# But in reality, you would read data from file, for example with :
#data = np.loadtxt("data.txt")

# Empirical average and variance are computed
avg = np.mean(data)
var = np.var(data)
# From that, we know the shape of the fitted Gaussian.
pdf_x = np.linspace(np.min(data),np.max(data),100)
pdf_y = 1.0/np.sqrt(2*np.pi*var)*np.exp(-0.5*(pdf_x-avg)**2/var)

# Then we plot :
plt.figure()
plt.hist(data,30,normed=True)
plt.plot(pdf_x,pdf_y,'k--')
plt.legend(("Fit","Data"),"best")
plt.show()

并且是输出。

从

python3.8

开始，标准库将对象作为模块的一部分提供

NormalDist

对象可以使用该方法从一组数据中构建，并提供对其均值（）和标准偏差（）的访问：

我有点困惑，

norm.fit

显然只适用于扩展的采样值列表。我试着给它两个数字列表，或者元组列表，但它似乎只会把所有东西都压扁，并威胁到作为单个样本的输入。因为我已经有了一个基于数百万个样本的柱状图，如果不需要的话，我不想扩展它。谢天谢地，正态分布计算起来很简单，所以

# histogram is [(val,count)]
from math import sqrt

def normfit(hist):
    n,s,ss = univar(hist)
    mu = s/n
    var = ss/n-mu*mu
    return (mu, sqrt(var))

def univar(hist):
    n = 0
    s = 0
    ss = 0
    for v,c in hist:
        n += c
        s += c*v
        ss += c*v*v
    return n, s, ss

我肯定这一定是图书馆提供的，但因为我在任何地方都找不到，所以我把它贴在这里。请随意指出正确的方法并向下投票给我：-）

“用高斯函数拟合此直方图”？通常我们只是直接计算直方图的平均值和标准差。“用高斯函数拟合直方图”是什么意思？如何“直接”计算平均值和标准偏差。如果直方图不是一个真正的高斯分布，我想用对数正态分布来拟合它呢？任何一组数据点的平均值和标准偏差都有方程，不管它们的分布如何。任何曲线（如直线y=mx+b）都可以适合任何数据集。您需要阅读基本统计函数（平均值、中值、模式、方差等）和最小二乘近似值。在对更复杂的曲线进行试验之前，首先了解基本（线性和二次）函数的曲线拟合。如果你有数据，实际上不需要曲线拟合。只需找到平均值和标准偏差，并将它们插入正态（也称为高斯）分布（）的公式中。直方图的平均值是

sum（值*频率表示值，频率表示h）/sum（频率表示u，频率表示h）

。标准偏差同样简单——但对于注释来说有点长。你能更新这个问题来更详细地解释你想做什么吗？我想对我的数据集这样做，不需要缩放，从而得到我的数据的sigma。。不是什么规模的西格玛@用户2820579“适合高度”是什么意思？这篇文章完美地回答了OP上的问题。如果它不适合你的特定问题，提出一个新的问题，但不要否决一个有效的答案。对不起，我误解了

（mu，sigma）=norm.fit（datos）

。这是一个Guassian fit吗？由于不赞成的问题，最好是“scipy.stats.norm.pdf”而不是“mlab.normpdf”我想知道为什么我在使用你的函数和joaquin建议的函数时会得到非常不同的拟合？有关详细信息，请参阅我的相关问题。。。。

from statistics import NormalDist

# data = [0.7237248252340628, 0.6402731706462489, -1.0616113628912391, -1.7796451823371144, -0.1475852030122049, 0.5617952240065559, -0.6371760932160501, -0.7257277223562687, 1.699633029946764, 0.2155375969350495, -0.33371076371293323, 0.1905125348631894, -0.8175477853425216, -1.7549449090704003, -0.512427115804309, 0.9720486316086447, 0.6248742504909869, 0.7450655841312533, -0.1451632129830228, -1.0252663611514108]
norm = NormalDist.from_samples(data)
# NormalDist(mu=-0.12836704320073597, sigma=0.9240861018557649)
norm.mean
# -0.12836704320073597
norm.stdev
# 0.9240861018557649

# histogram is [(val,count)]
from math import sqrt

def normfit(hist):
    n,s,ss = univar(hist)
    mu = s/n
    var = ss/n-mu*mu
    return (mu, sqrt(var))

def univar(hist):
    n = 0
    s = 0
    ss = 0
    for v,c in hist:
        n += c
        s += c*v
        ss += c*v*v
    return n, s, ss