Python曲线拟合指数/幂/对数曲线-改善结果

Python曲线拟合指数/幂/对数曲线-改善结果,python,optimization,scipy,logistic-regression,curve-fitting,Python,Optimization,Scipy,Logistic Regression,Curve Fitting,我试图拟合这个数据,它正逐渐接近零(但从未达到零) 我相信最好的曲线是逆逻辑函数,但可以接受建议。关键是预期的衰减“S曲线”形状 这是我到目前为止的代码和下面的绘图图像,这是一个非常难看的匹配 import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit # DATA x = pd.Series([1,1,264,882,913,1095,

我试图拟合这个数据,它正逐渐接近零(但从未达到零)

我相信最好的曲线是逆逻辑函数,但可以接受建议。关键是预期的衰减“S曲线”形状

这是我到目前为止的代码和下面的绘图图像,这是一个非常难看的匹配

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# DATA

x = pd.Series([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469]).values
y = pd.Series([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06]).values

# Inverse Logistic Function 
# https://en.wikipedia.org/wiki/Logistic_function
def func(x, L ,x0, k, b):
    y = 1/(L / (1 + np.exp(-k*(x-x0)))+b)
    return y

# FIT DATA

p0 = [max(y), np.median(x),1,min(y)] # this is an mandatory initial guess
popt, pcov = curve_fit(func, x, y,p0, method='dogbox',maxfev=10000)

# PERFORMANCE

modelPredictions = func(x, *popt)
absError = modelPredictions - y
SE = np.square(absError) # squared errors
MSE = np.mean(SE) # mean squared errors
RMSE = np.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (np.var(absError) / np.var(y))

print('Parameters:', popt)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

#PLOT

plt.figure()
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.yscale('log')
#plt.xscale('log')
plt.show()
下面是运行此代码时的结果。。。以及我想要达到的目标

我怎样才能更好地优化曲线拟合,这样就不用代码生成的红线,而是得到更接近蓝线的东西


谢谢

根据您的数据图和预期拟合,我猜您并不真的想将数据
y
建模为类似逻辑的阶跃函数,而是将
log(y)
建模为类似逻辑的阶跃函数

因此,我认为您可能需要使用逻辑阶跃函数,或者添加一个线性组件来建模此数据的日志。我会用
lmfit
实现这一点,因为它内置了模型,可以更好地报告结果,并允许您像使用(免责声明:我是主要作者)一样大大简化拟合代码:

这将打印一份报告,其中包含拟合统计数据和以下最佳拟合值:

[[Model]]
    (Model(step, form='logistic') + Model(linear))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 73
    # data points      = 87
    # variables        = 5
    chi-square         = 9.38961801
    reduced chi-square = 0.11450754
    Akaike info crit   = -183.688405
    Bayesian info crit = -171.358865
[[Variables]]
    amplitude: -4.89008796 +/- 0.29600969 (6.05%) (init = -5)
    center:     1180.65823 +/- 15.2836422 (1.29%) (init = 1000)
    sigma:      94.0317580 +/- 18.5328976 (19.71%) (init = 100)
    slope:     -0.00147861 +/- 8.1151e-05 (5.49%) (init = 0)
    intercept:  6.95177838 +/- 0.17170849 (2.47%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(amplitude, slope)     = -0.798
    C(amplitude, sigma)     = -0.649
    C(amplitude, intercept) = -0.605
    C(center, intercept)    = -0.574
    C(sigma, slope)         =  0.542
    C(sigma, intercept)     =  0.348
    C(center, sigma)        = -0.335
    C(amplitude, center)    =  0.282
[[Model]]
(模型(阶跃,形式='logistic')+模型(线性))
[[Fit统计数据]]
#拟合方法=最小二乘法
#函数evals=73
#数据点=87
#变量=5
卡方检验=9.38961801
缩减卡方检验=0.11450754
Akaike信息临界值=-183.688405
贝叶斯信息标准=-171.358865
[[变量]]
振幅:-4.89008796+/-0.29600969(6.05%)(初始值=-5)
中心:1180.65823+/-15.2836422(1.29%)(初始值=1000)
西格玛:94.0317580+/-18.5328976(19.71%)(初始值=100)
斜率:-0.00147861+/-8.1151e-05(5.49%)(初始值=0)
截距:6.95177838+/-0.17170849(2.47%)(初始值=0)
[[相关性]](未报告的相关性<0.100)
C(振幅、斜率)=-0.798
C(振幅,σ)=-0.649
C(振幅、截距)=-0.605
C(中心,截距)=-0.574
C(西格玛,斜率)=0.542
C(西格玛,截距)=0.348
C(中心,西格玛)=-0.335
C(振幅,中心)=0.282
制作一个这样的情节


如果你愿意,你当然可以用scipy.optimize.curve_fit来重现所有这些,但我将把它作为一个练习。

从你的数据图和预期拟合来看,我猜你并不真的想把你的数据建模为类似逻辑的阶跃函数,而是
log(y)
作为类似逻辑的步骤功能

因此,我认为您可能需要使用逻辑阶跃函数,或者添加一个线性组件来建模此数据的日志。我会用
lmfit
实现这一点,因为它内置了模型,可以更好地报告结果,并允许您像使用(免责声明:我是主要作者)一样大大简化拟合代码:

这将打印一份报告,其中包含拟合统计数据和以下最佳拟合值:

[[Model]]
    (Model(step, form='logistic') + Model(linear))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 73
    # data points      = 87
    # variables        = 5
    chi-square         = 9.38961801
    reduced chi-square = 0.11450754
    Akaike info crit   = -183.688405
    Bayesian info crit = -171.358865
[[Variables]]
    amplitude: -4.89008796 +/- 0.29600969 (6.05%) (init = -5)
    center:     1180.65823 +/- 15.2836422 (1.29%) (init = 1000)
    sigma:      94.0317580 +/- 18.5328976 (19.71%) (init = 100)
    slope:     -0.00147861 +/- 8.1151e-05 (5.49%) (init = 0)
    intercept:  6.95177838 +/- 0.17170849 (2.47%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(amplitude, slope)     = -0.798
    C(amplitude, sigma)     = -0.649
    C(amplitude, intercept) = -0.605
    C(center, intercept)    = -0.574
    C(sigma, slope)         =  0.542
    C(sigma, intercept)     =  0.348
    C(center, sigma)        = -0.335
    C(amplitude, center)    =  0.282
[[Model]]
(模型(阶跃,形式='logistic')+模型(线性))
[[Fit统计数据]]
#拟合方法=最小二乘法
#函数evals=73
#数据点=87
#变量=5
卡方检验=9.38961801
缩减卡方检验=0.11450754
Akaike信息临界值=-183.688405
贝叶斯信息标准=-171.358865
[[变量]]
振幅:-4.89008796+/-0.29600969(6.05%)(初始值=-5)
中心:1180.65823+/-15.2836422(1.29%)(初始值=1000)
西格玛:94.0317580+/-18.5328976(19.71%)(初始值=100)
斜率:-0.00147861+/-8.1151e-05(5.49%)(初始值=0)
截距:6.95177838+/-0.17170849(2.47%)(初始值=0)
[[相关性]](未报告的相关性<0.100)
C(振幅、斜率)=-0.798
C(振幅,σ)=-0.649
C(振幅、截距)=-0.605
C(中心,截距)=-0.574
C(西格玛,斜率)=0.542
C(西格玛,截距)=0.348
C(中心,西格玛)=-0.335
C(振幅,中心)=0.282
制作一个这样的情节


如果您愿意,您当然可以使用scipy.optimize.curve\u fit复制所有这些内容,但我会将其作为练习。

在您的情况下,我会将双曲切线1拟合到数据的10底对数

让我们使用

                                       log10(y)=y₀ - a tanh(λ(x-x)₀))

作为你的职责

大约x从0到3500,log10(y)从3到-1,条件是tanh(2)=-tanh(2)≈ 我们有

            Y₀+a=3,y0-a=-1⇒ Y₀ = 1,a=2

            λ=(2-(-2))/(3500-0);x₀ = (3500-0)/2

(这一粗略估计对于用初始猜测验证曲线拟合是必要的,否则程序会丢失)

省略了我最终得到的样板

X = np.linspace(0, 3500, 701)
plt.scatter(x, np.log10(y), label='data')
plt.plot(X, 1-2*np.tanh(4/3500*(X-1750)), label='hand fit')
(y0, a, l, x0), *_ = curve_fit(
    lambda x, y0, a, l,x 0: y0 - a*np.tanh(l*(x-x0)),
    x, np.log10(y),
    p0=[1, 2, 4/3500, 3500/2])
plt.plot(X, y0-a*np.tanh(l*(X-x0)), label='curve_fit fit')
plt.legend()



注1:

在您的情况下,我将双曲切线1拟合到数据的10底对数

让我们使用

                                       log10(y)=y₀ - a tanh(λ(x-x)₀))

作为你的职责

大约x从0到3500,log10(y)从3到-1,条件是tanh(2)=-tanh(2)≈ 我们有

            Y₀+a=3,y0-a=-1⇒ Y₀ = 1,a=2

            λ=(2-(-2))/(3500-0);x₀ = (3500-0)/2

(这一粗略估计对于用初始猜测验证曲线拟合是必要的,否则程序会丢失)

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings


xData = numpy.array([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469], dtype=float)
yData = numpy.array([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06], dtype=float)

# fit the natural lpg of the data
yData = numpy.log(yData)

warnings.filterwarnings("ignore") # do not print "invalid value" warnings during fit
def func(x, a, b, c, d): # Four-Parameter Logistic from zunzun.com
    return d + (a - d) / (1.0 + numpy.power(x / c, b))


# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0, 1.0])

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

print('Parameters:', fittedParameters)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Natural Log of Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)