Python 如何使用numy linalg lstsq拟合斜率相同但截距不同的两个数据集？_Python_Numpy_Curve Fitting_Least Squares_Data Fitting

Python 如何使用numy linalg lstsq拟合斜率相同但截距不同的两个数据集？

python numpy

Python 如何使用numy linalg lstsq拟合斜率相同但截距不同的两个数据集？,python,numpy,curve-fitting,least-squares,data-fitting,Python,Numpy,Curve Fitting,Least Squares,Data Fitting,我正在尝试加权最小二乘拟合，遇到了numpy.linalg.lstsq。我需要拟合加权最小二乘法。因此，以下工作： # Generate some synthetic data from the model. N = 50 x = np.sort(10 * np.random.rand(N)) yerr = 0.1 + 0.5 * np.random.rand(N) y = 10.0 * x + 15 y += yerr * np.random.randn(N) #do the fitting

我正在尝试加权最小二乘拟合，遇到了numpy.linalg.lstsq。我需要拟合加权最小二乘法。因此，以下工作：

# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]

(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
                                        np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
                                        np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T, 
                              np.concatenate((y1,y2)))

这给了我最合适的斜率和截距。现在，假设我有两个斜率相同但截距不同的数据集？如何进行接头拟合，以获得最佳拟合坡度加上两个截距。我仍然需要加权最小二乘法。对于一个未加权的案例，我发现以下方法有效：

# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]

(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
                                        np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
                                        np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T, 
                              np.concatenate((y1,y2)))

首先，我重写了你的第一个方法，因为它可以写得更清楚，在我看来，像这样

weights = 1 / yerr
m, b = np.linalg.lstsq(np.c_[weights * x, weights], weights * y, rcond=None)[0]

为了适应2个数据集，您可以堆叠2个数组，但将矩阵的某些元素设为0

np.random.seed(12)
N = 3
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)

M = 2
x1 = np.sort(10 * np.random.rand(M))
yerr1 = 0.1 * 0.5 * np.random.rand(M)
y1 = 10.0 * x1 + 25
y1 += yerr1 * np.random.randn(M)
#do the fitting
weights = 1 / yerr
weights1 = 1 / yerr1
first_column = np.r_[weights * x, weights1 * x1]
second_column = np.r_[weights, [0] * x1.size]
third_column = np.r_[[0] * x.size, weights1]
a = np.c_[first_column, second_column, third_column]
print(a)
# [[  4.20211437   2.72576342   0.        ]
#  [ 24.54293941   9.32075195   0.        ]
#  [ 13.22997409   1.78771428   0.        ]
#  [126.37829241   0.          26.03711851]
#  [686.96961895   0.         124.44253391]]
c = np.r_[weights * y, weights1 * y1]
print(c)
# [  83.66073785  383.70595203  159.12058215 1914.59065915 9981.85549321]
m, b1, b2 = np.linalg.lstsq(a, c, rcond=None)[0]
print(m, b1, b2)
# 10.012202998026055 14.841412336510793 24.941219918240172

编辑

如果你想要不同的坡度和一个截距，你可以这样做。也许更好的办法是掌握“一个坡度2拦截”案例的总体思路。看看数组a：你们从权重和c中构造它，所以现在它是一个未加权的问题。你试着通过最小化差值的平方和，尽可能地找到a@vector=c的向量=[slope，intercept1，intercept2]。通过将零放在a中，我们使其可分离：矩阵的上半部分是可变斜率和截距1，下半部分是可变斜率和截距2。类似于向量为[slope1，slope2，intercept]的2斜率情况

但这会产生不同的截距和斜率，有零和没有零，对吗？据我所知，他想保留一个修正。@Joe我不确定我是否完全理解你的评论，但它给出了一个斜率和两个截距。当我们生成y和y1时，我们使用10、15和25作为斜率和2个截距。最后一个代码列表显示我们得到了这个值。好的，我现在知道了。将删除我的答案。一个后续问题：如果我有两个斜率不同但截距相同的数据集，代码将如何修改？非常感谢。这似乎很有效。最后一个问题：有没有办法得到与斜率和截距相关的标准不确定度？输出似乎只给出剩余的总和。