Python基线校正库_Python_Numpy_Scipy_Signal Processing

Python基线校正库

python numpy

Python基线校正库,python,numpy,scipy,signal-processing,Python,Numpy,Scipy,Signal Processing,我目前正在处理一些拉曼光谱数据，我正试图纠正由荧光偏斜引起的数据。请看下图：我很快就能达到我想要的。正如你所看到的，我试图在我所有的数据中拟合一个多项式，而实际上我应该在局部极小值处拟合一个多项式理想情况下，我希望有一个多项式拟合，当从我的原始数据中减去它时，会产生如下结果：是否有任何内置的libs已经做到了这一点如果没有，有什么简单的算法可以推荐给我吗我找到了我的问题的答案，只想与所有偶然发现这个问题的人分享 p.Eilers和H.Boelens在2005年提出了一种称为“不对称

我目前正在处理一些拉曼光谱数据，我正试图纠正由荧光偏斜引起的数据。请看下图：

我很快就能达到我想要的。正如你所看到的，我试图在我所有的数据中拟合一个多项式，而实际上我应该在局部极小值处拟合一个多项式

理想情况下，我希望有一个多项式拟合，当从我的原始数据中减去它时，会产生如下结果：

是否有任何内置的libs已经做到了这一点

如果没有，有什么简单的算法可以推荐给我吗

我找到了我的问题的答案，只想与所有偶然发现这个问题的人分享

p.Eilers和H.Boelens在2005年提出了一种称为“不对称最小二乘平滑”的算法。这篇论文是免费的，你可以在谷歌上找到它

def baseline_als(y, lam, p, niter=10):
  L = len(y)
  D = sparse.csc_matrix(np.diff(np.eye(L), 2))
  w = np.ones(L)
  for i in xrange(niter):
    W = sparse.spdiags(w, 0, L, L)
    Z = W + lam * D.dot(D.transpose())
    z = spsolve(Z, w*y)
    w = p * (y > z) + (1-p) * (y < z)
  return z

def基线值（y，lam，p，niter=10）：
L=len（y）
D=稀疏csc_矩阵（np.diff（np.eye（L），2））
w=np.ones（L）
对于X范围内的i（niter）：
W=sparse.spdiags（W，0，L，L）
Z=W+lam*D.dot（D.transpose（））
z=spsolve（z，w*y）
w=p*（y>z）+（1-p）*（y

我知道这是一个老问题，但几个月前我就解决了这个问题，并使用spick.sparse例程实现了相同的答案

# Baseline removal                                                                                            

def baseline_als(y, lam, p, niter=10):                                                                        

    s  = len(y)                                                                                               
    # assemble difference matrix                                                                              
    D0 = sparse.eye( s )                                                                                      
    d1 = [numpy.ones( s-1 ) * -2]                                                                             
    D1 = sparse.diags( d1, [-1] )                                                                             
    d2 = [ numpy.ones( s-2 ) * 1]                                                                             
    D2 = sparse.diags( d2, [-2] )                                                                             

    D  = D0 + D2 + D1                                                                                         
    w  = np.ones( s )                                                                                         
    for i in range( niter ):                                                                                  
        W = sparse.diags( [w], [0] )                                                                          
        Z =  W + lam*D.dot( D.transpose() )                                                                   
        z = spsolve( Z, w*y )                                                                                 
        w = p * (y > z) + (1-p) * (y < z)                                                                     

    return z

#基线删除
def基线值（y、lam、p、niter=10）：
s=len（y）
#组合差分矩阵
D0=稀疏。眼睛（s）
d1=[整数位数（s-1）*-2]
D1=稀疏的.diags（D1，[-1]）
d2=[整数位数（s-2）*1]
D2=稀疏的.diags（D2，[-2]）
D=D0+D2+D1
w=np.一（s）
对于范围内的i（niter）：
W=稀疏的.diags（[W]，[0]）
Z=W+lam*D.dot（D.transpose（））
z=spsolve（z，w*y）
w=p*（y>z）+（1-p）*（y


干杯
Pedro.
以下代码适用于Python 3.6
这是根据公认的正确答案改编的，以避免密集矩阵diff
计算（这很容易导致内存问题），并使用range
（而不是xrange
）
将numpy导入为np
从scipy导入稀疏
从scipy.sparse.linalg导入spsolve
def基线值（y、lam、p、niter=10）：
L=len（y）
D=稀疏图（[1，-2,1]，[0，-1，-2]，形状=（L，L-2））
w=np.ones（L）
对于范围内的i（niter）：
W=sparse.spdiags（W，0，L，L）
Z=W+lam*D.dot（D.transpose（））
z=spsolve（z，w*y）
w=p*（y>z）+（1-p）*（y
最近，我需要使用这种方法。来自answers的代码工作得很好，但它显然过度使用了内存。因此，这是我的版本与优化内存使用
def基线优化（y、lam、p、niter=10）：
L=len（y）
D=稀疏图（[1，-2,1]，[0，-1，-2]，形状=（L，L-2））
D=lam*D.dot（D.transpose（））#预计算此项，因为它不依赖于'w'`
w=np.ones（L）
W=sparse.spdiags（W，0，L，L）
对于范围内的i（niter）：
W.setdiag（W）#不要创建新矩阵，只需更新对角线值即可
Z=W+D
z=spsolve（z，w*y）
w=p*（y>z）+（1-p）*（y

根据我下面的基准测试，速度也快了1.5倍
%%timeit -n 1000 -r 10 y = randn(1000)
baseline_als(y, 10000, 0.05) # function from @jpantina's answer
# 20.5 ms ± 382 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

%%timeit -n 1000 -r 10 y = randn(1000)
baseline_als_optimized(y, 10000, 0.05)
# 13.3 ms ± 874 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

注1:原文说：
为了强调算法的基本简单性，迭代次数被固定为10次。在实际应用中，应检查重量是否有任何变化；如果没有，则已实现收敛
因此，这意味着停止迭代的更正确的方法是检查|w|u new-w|124;

注2:另一个有用的引用（来自@glycoaddict的评论）给出了如何选择参数值的想法
有两个参数：p表示不对称性，λ表示平滑度。两者都必须是
调整到手头的数据。我们发现，通常为0.001≤ P≤ 0.1是一个很好的选择（对于具有正峰值的信号）和102≤ λ ≤ 109，但可能会出现例外情况。在任何情况下，应在对数λ近似线性的网格上改变λ。通常，目视检查足以获得良好的参数值
有一个python库可用于基线校正/删除。它有Modpoly、IModploy和Zhang-fit算法，当您将原始值作为python列表或熊猫系列输入并指定多项式次数时，这些算法可以返回基线校正结果
将库安装为pip Install baselineremovation
。下面是一个例子
from BaselineRemoval import BaselineRemoval

input_array=[10,20,1.5,5,2,9,99,25,47]
polynomial_degree=2 #only needed for Modpoly and IModPoly algorithm

baseObj=BaselineRemoval(input_array)
Modpoly_output=baseObj.ModPoly(polynomial_degree)
Imodpoly_output=baseObj.IModPoly(polynomial_degree)
Zhangfit_output=baseObj.ZhangFit()

print('Original input:',input_array)
print('Modpoly base corrected values:',Modpoly_output)
print('IModPoly base corrected values:',Imodpoly_output)
print('ZhangFit base corrected values:',Zhangfit_output)

Original input: [10, 20, 1.5, 5, 2, 9, 99, 25, 47]
Modpoly base corrected values: [-1.98455800e-04  1.61793368e+01  1.08455179e+00  5.21544654e+00
  7.20210508e-02  2.15427531e+00  8.44622093e+01 -4.17691125e-03
  8.75511661e+00]
IModPoly base corrected values: [-0.84912125 15.13786196 -0.11351367  3.89675187 -1.33134142  0.70220645
 82.99739548 -1.44577432  7.37269705]
ZhangFit base corrected values: [ 8.49924691e+00  1.84994576e+01 -3.31739230e-04  3.49854060e+00
  4.97412948e-01  7.49628529e+00  9.74951576e+01  2.34940300e+01
  4.54929023e+01

我使用了之前评论中引用的算法版本，这是对p
from BaselineRemoval import BaselineRemoval

input_array=[10,20,1.5,5,2,9,99,25,47]
polynomial_degree=2 #only needed for Modpoly and IModPoly algorithm

baseObj=BaselineRemoval(input_array)
Modpoly_output=baseObj.ModPoly(polynomial_degree)
Imodpoly_output=baseObj.IModPoly(polynomial_degree)
Zhangfit_output=baseObj.ZhangFit()

print('Original input:',input_array)
print('Modpoly base corrected values:',Modpoly_output)
print('IModPoly base corrected values:',Imodpoly_output)
print('ZhangFit base corrected values:',Zhangfit_output)

Original input: [10, 20, 1.5, 5, 2, 9, 99, 25, 47]
Modpoly base corrected values: [-1.98455800e-04  1.61793368e+01  1.08455179e+00  5.21544654e+00
  7.20210508e-02  2.15427531e+00  8.44622093e+01 -4.17691125e-03
  8.75511661e+00]
IModPoly base corrected values: [-0.84912125 15.13786196 -0.11351367  3.89675187 -1.33134142  0.70220645
 82.99739548 -1.44577432  7.37269705]
ZhangFit base corrected values: [ 8.49924691e+00  1.84994576e+01 -3.31739230e-04  3.49854060e+00
  4.97412948e-01  7.49628529e+00  9.74951576e+01  2.34940300e+01
  4.54929023e+01

from scipy import sparse
from scipy.sparse import linalg
import numpy as np
from numpy.linalg import norm


def baseline_arPLS(y, ratio=1e-6, lam=100, niter=10, full_output=False):
    L = len(y)

    diag = np.ones(L - 2)
    D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L - 2)

    H = lam * D.dot(D.T)  # The transposes are flipped w.r.t the Algorithm on pg. 252

    w = np.ones(L)
    W = sparse.spdiags(w, 0, L, L)

    crit = 1
    count = 0

    while crit > ratio:
        z = linalg.spsolve(W + H, W * y)
        d = y - z
        dn = d[d < 0]

        m = np.mean(dn)
        s = np.std(dn)

        w_new = 1 / (1 + np.exp(2 * (d - (2*s - m))/s))

        crit = norm(w_new - w) / norm(w)

        w = w_new
        W.setdiag(w)  # Do not create a new matrix, just update diagonal values

        count += 1

        if count > niter:
            print('Maximum number of iterations exceeded')
            break

    if full_output:
        info = {'num_iter': count, 'stop_criterion': crit}
        return z, d, info
    else:
        return z

def spectra_model(x):
    coeff = np.array([100, 200, 100])
    mean = np.array([300, 750, 800])

    stdv = np.array([15, 30, 15])

    terms = []
    for ind in range(len(coeff)):
        term = coeff[ind] * np.exp(-((x - mean[ind]) / stdv[ind])**2)
        terms.append(term)

    spectra = sum(terms)

    return spectra

x_vals = np.arange(1, 1001)
spectra_sim = spectra_model(x_vals)

from scipy.interpolate import CubicSpline
x_poly = np.array([0, 250, 700, 1000])
y_poly = np.array([200, 180, 230, 200])

poly = CubicSpline(x_poly, y_poly)
baseline = poly(x_vals)

noise = np.random.randn(len(x_vals)) * 0.1
spectra_base = spectra_sim + baseline + noise

 _, spectra_arPLS, info = baseline_arPLS(spectra_base, lam=1e4, niter=10,
                                         full_output=True)