Python 使用pandas对曲线拟合数据进行矢量化_Python_Pandas_Vectorization_Curve Fitting

Python 使用pandas对曲线拟合数据进行矢量化

python pandas

Python 使用pandas对曲线拟合数据进行矢量化,python,pandas,vectorization,curve-fitting,Python,Pandas,Vectorization,Curve Fitting,我想加速数据帧上的一个过程，其中数据帧中的每一行都是点（图像中的红色点），我将每一行拟合到多项式（图像中的蓝色点）：我的数据框看起来像这样： 0 21.357071 21.357071 NaN 29.240519 20.909416 23.884323 NaN NaN 21.533360 19.145000 NaN 1 29.373487 29.373487 NaN 32.593994 26.423960 2

我想加速数据帧上的一个过程，其中数据帧中的每一行都是点（图像中的红色点），我将每一行拟合到多项式（图像中的蓝色点）：

我的数据框看起来像这样：

0   21.357071   21.357071   NaN     29.240519   20.909416   23.884323   NaN     NaN     21.533360   19.145000   NaN
1   29.373487   29.373487   NaN     32.593994   26.423960   29.623251   NaN     NaN     30.685534   29.297455   20.411913
2   19.116655   19.116655   NaN     27.120478   18.723265   19.857676   NaN     NaN     20.249647   18.867172   NaN

我已经使用以下代码完成了此操作：

for index,row in df.iterrows():
  dataR = row[:].dropna()

  x = np.array(dataR.index).astype(float) #x = column index
  y = dataR.values
  y = np.vstack(y).astype(np.float).T[0]  #y = value

  coefs = poly.polyfit(x, y, deg=4)
  ffit = poly.polyval(np.arange(0,maxColumns,1), coefs)
  df.loc[index,0:maxColumns] = ffit

但我的数据帧非常大，所以速度很慢。我想知道我是否可以将这段代码矢量化。

因为看起来您是独立处理每一行，并且不管其他行是什么样子都执行曲线拟合，所以我认为您可以使用

from joblib import Parallel, delayed

function fit_curve(row):
    dataR = row[:].dropna()
    x = np.array(dataR.index).astype(float)
    y = dataR.values
    y = np.vstack(y).astype(np.float).T[0]
    coefs = poly.polyfit(x, y, deg=4)
    ffit = poly.polyval(np.arange(0,maxColumns,1), coefs)
    return ffit

fitted_curves = Parallel(n_jobs=N)(delayed(fit_curve)(row) for index, row in df.iterrows())
df.loc[:,:] = fitted_curves

其中N是工人数量，也称为。您希望用于此操作的内核。

这不是一个矢量化操作，但现在速度非常快，大约快了10倍，没错。我不认为可以用这种方式轻松地对numpy曲线拟合本身进行矢量化，因此要真正对其进行矢量化，您可能需要自己重新实现它。