如何在Matlab中拟合多峰对数正态分布?

如何在Matlab中拟合多峰对数正态分布?,matlab,distribution,curve-fitting,Matlab,Distribution,Curve Fitting,我需要拟合代表粒度测量的多峰分布。例如,这些测量值可能如下所示: [yM_in, pp_in] = max(DPF_in_mean); xM_in = x_data(pp_in); [yM_out, pp_out] = max(DPF_out_mean); xM_out = x_data(pp_out); xR_in = x_data / xM_in; yR_in = DPF_in_mean / yM_in; xR_out = x_data / xM_out; yR_out = DPF_out

我需要拟合代表粒度测量的多峰分布。例如,这些测量值可能如下所示:

[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);

xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;

opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);

p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];
fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));

现在我想拟合这些曲线。在我的帮助下,对于单峰分布函数,我得到了相当不错的结果:

fun = @(p,x)(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2)));
通过如下方式缩放结果参数:

[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);

xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;

opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);

p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];
fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));
但是,如果我绘制结果拟合图,很明显,单峰分布不足以拟合测量值:

在维基百科关于它的文章中,我似乎可以将第二个发行版混合在一起,如下所示:

[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);

xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;

opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);

p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];
fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));
但是,我不知道如何在缩放中集成其他参数

p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
因为我真的不明白在这个缩放步骤中发生了什么

如何使用多模态分布来拟合我的测量值

编辑

使用的数据如下所示:

x_data = [4.87000000000000e-09 5.62000000000000e-09 6.49000000000000e-09 7.50000000000000e-09 8.66000000000000e-09 ...
          1.00000000000000e-08 1.15500000000000e-08 1.33400000000000e-08 1.54000000000000e-08 1.77800000000000e-08 ...
          2.05400000000000e-08 2.37100000000000e-08 2.73800000000000e-08 3.16200000000000e-08 3.65200000000000e-08 ...
          4.21700000000000e-08 4.87000000000000e-08 5.62300000000000e-08 6.49400000000000e-08 7.49900000000000e-08 ...
          8.66000000000000e-08 1.00000000000000e-07 1.15480000000000e-07 1.33350000000000e-07 1.53990000000000e-07 ...
          1.77830000000000e-07 2.05350000000000e-07 2.37140000000000e-07 2.73840000000000e-07 3.16230000000000e-07 ...
          3.65170000000000e-07 4.21700000000000e-07 4.86970000000000e-07 5.62340000000000e-07 6.49380000000000e-07 ...
          7.49890000000000e-07 8.65960000000000e-07 1.00000000000000e-06];

DPF_in_mean = [188318640795.745 360952462222.222 750859638450.704 2226776878843.93 4845941940346.82 7979258430057.80 ...
               11010887350289.0 13462058712138.7 15090350247398.8 15991756383815.0 16680978441618.5 17862081914450.9 ...
               20071390890173.4 23460963364161.9 27630428508670.5 31777265780346.8 35520451433526.0 38587652184971.1 ...
               40516972485549.1 41326812092485.6 41127130682080.9 40038712485549.1 37976259664739.9 34725415132948.0 ...
               30177578265896.0 24546703179190.8 18400851109826.6 12500471611560.7 7540309609248.56 3912091102658.96 ...
               1632974141040.46 458500289086.705 126012891030.303 0 0 0 7276263267.44526 11203995842.0392];

DPF_out_mean = [444898373533.333 1032357396444.44 1675044380444.44 2316141430222.22 2852971589555.56 3151959865111.11 ...
                3134892475777.78 2828026308000.00 2325761940666.67 1745907627777.78 1192912799111.11 742253282222.222 ...
                430349362888.889 255820144555.556 188235813444.444 181970493622.222 204829338533.333 233009821977.778 ...
                243007623333.333 230736732777.778 202426609488.889 169758857200.000 140604138622.222 116482776222.222 ...
                95076737155.5556 74172071777.7778 53672033733.3333 35251323911.1111 20813708255.5556 11102006362.8889 ...
                5497173092.96089 2625918349.76536 1471042995.80373 1012939492.96541 751738952.194595 589422111.731818 ...
                479373451.936508 378359645.767442]; 

这里有一种可能是有用的。由于数据中的大峰值掩盖了较小的峰值,因此减去较大的峰值数据会将小峰值数据隔离以供分析。如果您知道大峰值的形式,您可以将数据与之匹配,然后在每个数据集中只剩下两个双峰中的一个进行分析。一旦找到第二个峰值的形式,就可以通过使用之前的分析拟合值作为最终分析的初始参数值拟合两个峰值的总和来重新开始

我在两个数据集上都进行了方程搜索,为每个数据集中的主峰找到合适的峰值方程,以下是我的结果。没有进行数据转换或预处理,我使用发布的原始数据

对于DPF_in_意味着我的主峰为:

def Peak_LogNormalA_model(x_in): # from zunzun.com using DPF_out_mean
    # coefficients
    a = 3.1863877879345913E+12
    b = -1.8334716040160675E+01
    c = 4.4913908739937525E-01

    return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in)-b) / c, 2.0))

对于DPF_out_,我的意思是我的主峰为:

def Peak_LogNormalA_model(x_in): # from zunzun.com using DPF_out_mean
    # coefficients
    a = 3.1863877879345913E+12
    b = -1.8334716040160675E+01
    c = 4.4913908739937525E-01

    return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in)-b) / c, 2.0))

嗨,詹姆斯,不太喜欢。我需要这个来适应作为我博士论文一部分建立的模型输入的测量。我必须承认,在我上学的时候,我们没有使用任何Matlab或Python,所以我对你的问题感到非常惊讶。虽然如果你认为博士学位是一项非常广泛的学业,但很可能是学业…明白。请发布示例数据或示例数据链接好吗?是!为stackoverflow提供数据文件的推荐方法是什么?我在问题中直接添加了数据。我认为这是处理数据的最佳方法,因为否则无法保证数据在以后可用,对吗?变量
x\u data
是两种情况下的x数据
DPF\u in\u mean
DPF\u out\u mean
。如第一张图所示,这只是两个独立的测量值。此处,
DPF\u in\u mean
对应上游测量值,
DPF\u out\u mean
对应下游测量值。然后我试着用
fun
来拟合这两条曲线。嗨,詹姆斯,很抱歉我回复晚了。我知道这些是最适合单峰分布的,但我如何使用多峰分布(即具有两个或多个峰值)?想法是从观测数据中减去主峰方程的预测值,只留下次要峰值进行分析。在对第二个峰值的数据进行分析并为其找到合适的方程后,使用两个独立回归的参数值作为最终回归分析的初始参数值,通过拟合“y=[峰值方程主]+[峰值方程次”重新开始。啊,现在我明白了!我今天将尝试这个。