python中曲线与直方图的拟合_Python_Pandas_Distribution_Curve Fitting

python中曲线与直方图的拟合

python pandas

python中曲线与直方图的拟合,python,pandas,distribution,curve-fitting,Python,Pandas,Distribution,Curve Fitting,我从我的pandas数据帧创建了一个直方图，我想将概率分布拟合到直方图中。我自己试过，但曲线不够好 . 到目前为止，我的代码如下： h=sorted(df_distr['Strecke']) m = df_distr['Strecke'].mean() std = df_distr['Strecke'].std() h=sorted(df_distr['Strecke']) distr=(df_distr['Strecke']) fig=plt.figure(figsize=(16,9))

我从我的pandas数据帧创建了一个直方图，我想将概率分布拟合到直方图中。我自己试过，但曲线不够好 . 到目前为止，我的代码如下：

h=sorted(df_distr['Strecke'])
m = df_distr['Strecke'].mean()
std = df_distr['Strecke'].std()
h=sorted(df_distr['Strecke'])
distr=(df_distr['Strecke'])

fig=plt.figure(figsize=(16,9))

# the histogram of the data
binwidth = range(-1,500)
n, bins, patches = plt.hist(h, bins=binwidth, normed=1, facecolor='green', alpha=0.75, histtype='step')
df = pd.DataFrame({'Strecke': bins[:-1]+1, 'Propability': n})

# add a 'best fit' line  
y = mlab.normpdf( bins, m, std)
l = plt.plot(bins, y, 'r--', linewidth=1)

有没有可能更好地拟合曲线？是否还有其他分布，如半范数、对数范数或威布尔分布

更新最后，我可以找到数据集的最佳分布。实现了以下代码：

#the histogram of the data
binwidth = range(-1,500)
n, bins, patches = plt.hist(h, bins=binwidth, normed=1, facecolor='cyan', alpha=0.5, label="Histogram")
xt=plt.xticks()[0]
xmin, xmax = 0,max(xt)
lnspc = np.linspace(xmin,xmax,500)

m,s = stats.norm.fit(h)
pdf_g=stats.norm.pdf(lnspc,m,s)
#plt.plot(lnspc,pdf_g, label="Normal")

ag,bg,cg = stats.gamma.fit(h)  
pdf_gamma = stats.gamma.pdf(lnspc, ag, bg,cg)  
#plt.plot(lnspc, pdf_gamma, label="Gamma")

ab,bb,cb,db = stats.beta.fit(h)  
pdf_beta = stats.beta.pdf(lnspc, ab, bb,cb, db)  
#plt.plot(lnspc, pdf_beta, label="Beta")

gevfit = gev.fit(h)  
pdf_gev = gev.pdf(lnspc, *gevfit)  
plt.plot(lnspc, pdf_gev, label="GEV")

logfit = stats.lognorm.fit(h)  
pdf_lognorm = stats.lognorm.pdf(lnspc, *logfit)  
plt.plot(lnspc, pdf_lognorm, label="LogNormal")

weibfit = stats.weibull_min.fit(h)  
pdf_weib = stats.weibull_min.pdf(lnspc, *weibfit)  
#plt.plot(lnspc, pdf_weib, label="Weibull")

exponweibfit = stats.exponweib.fit(h)  
pdf_exponweib = stats.exponweib.pdf(lnspc, *exponweibfit)  
plt.plot(lnspc, pdf_exponweib, label="Exponential Weibull")

paretofit = stats.pareto.fit(h)
pdf_pareto = stats.pareto.pdf(lnspc, *paretofit)
plt.plot(lnspc, pdf_pareto, label ="Pareto")

plt.legend()


df = pd.DataFrame({'Strecke': bins[:-1]+1, 'Propability': n})
#R²
slope, intercept, r_value_norm, p_value, std_err = stats.linregress(df['Propability'],pdf_g)
#print ("R-squared Normal Distribution:", r_value_norm**2)

slope, intercept, r_value_gamma, p_value, std_err = stats.linregress(df['Propability'],pdf_gamma)
#print ("R-squared Gamma Distribution:", r_value_gamma**2)

slope, intercept, r_value_beta, p_value, std_err = stats.linregress(df['Propability'],pdf_beta)
#print ("R-squared Beta Distribution:", r_value_beta**2)

slope, intercept, r_value_gev, p_value, std_err = stats.linregress(df['Propability'],pdf_gev)
#print ("R-squared GEV Distribution:", r_value_gev**2)

slope, intercept, r_value_lognorm, p_value, std_err = stats.linregress(df['Propability'],pdf_lognorm)
#print ("R-squared LogNormal Distribution:", r_value_lognorm**2)

slope, intercept, r_value_weibull, p_value, std_err = stats.linregress(df['Propability'],pdf_weib)
#print ("R-squared Weibull Distribution:", r_value_weibull**2)

slope, intercept, r_value_exponweibull, p_value, std_err = stats.linregress(df['Propability'],pdf_exponweib)

slope, intercept, r_value_pareto, p_value, std_err = stats.linregress(df['Propability'],pdf_pareto)

例如，我得到了这些图：

谢谢你的帮助

您正在将其拟合到高斯曲线。从图上看，这不是正态分布。首先尝试绘制对数数据，看看是否符合。看起来更像指数分布me@JamesPhillips以下是我获得的原始数据链接：@JamesPhillips谢谢你的帮助！指数威布尔分布是该数据集的良好分布。我终于能够拟合不同的分布并分析它们。在比较R平方和残差平方和后，对数正态分布是最佳拟合。我将更新第一篇文章，展示我是如何实现它们的。@JamesPhillips你是对的。我明天要检查一下。