Python 3.x 线性回归模型拟合较差_Python 3.x_Pandas_Machine Learning_Linear Regression_Data Science

Python 3.x 线性回归模型拟合较差

python-3.x pandas machine-learning

Python 3.x 线性回归模型拟合较差,python-3.x,pandas,machine-learning,linear-regression,data-science,Python 3.x,Pandas,Machine Learning,Linear Regression,Data Science,我试图在数据集上拟合一个模型，该数据集包含第1列中的一个要素和第0列中附加的一个矢量。不管我怎么做，曲线都很难与数据吻合这是密码 import pandas as pd import matplotlib.pyplot as plt import numpy as np col = ['id','ri','na','mg','al','si','k','ca','ba','fe','glass_type'] data = pd.read_csv('glass.data', names=col

我试图在数据集上拟合一个模型，该数据集包含第1列中的一个要素和第0列中附加的一个矢量。不管我怎么做，曲线都很难与数据吻合

这是密码

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

col = ['id','ri','na','mg','al','si','k','ca','ba','fe','glass_type']
data = pd.read_csv('glass.data', names=col, index_col='id')

x = np.array(data)[:, 0]
x = x.reshape(np.size(x), 1)
y = np.array(data)[:, 3]
y = y.reshape(np.size(y), 1)


# initialising
m = np.size(x)

# appending ones vector in x
one = np.ones([m, 1], dtype=float)
x1 = np.append(one, x, axis=1)

# weight matrix
theta = np.zeros([2, 1])

i_list = []
j_l = []
error = np.zeros([m, 1])


# gradient descent
for i in range(3500):
    h = x1.dot(theta)
    error = h - y
    theta = theta - (0.0001/m) * np.sum(x1.T.dot(error)) + (1.5/m) * np.sum(np.sum((theta[:, 1:2])**2))
    i_list.append(i)
    j = (1/(2*m)) * np.sum((h-y)**2)
    j_l.append(j)


# plotting
plt.subplot(1, 2, 1)
plt.plot(x, y, '.r')
plt.plot(x, x1.dot(theta), '-b')

plt.subplot(1,2, 2)
plt.plot(i_list, j_l, '-g')

plt.show()

请给我建议改进的方法。谢谢：）

首先。θ必须在每次更新时通过正则化来减少。这是正则化的主要思想，但你要求和

另外，也不要忘记大的正则化参数，在那个里你们会得到很高的偏差。在这种情况下，尝试不同级别的正则化（0.03、0.3、3、30、300）。我的意思是试着在那里再放一个lambda：

例如：

theta = theta - (0.0001/m) * np.sum(x1.T.dot(error)) + (0.15/m) * np.sum(np.sum((theta[:, 1:2])**2))

这些数据看起来不像一条线可以拟合它。它似乎没有什么关联，它聚集在中心的那个点周围。尝试更改您输入的参数或回归模型。@primusa我使用scikit学习拟合数据，它拟合得非常好。