Python 如何在数据帧的行之间进行循环,并在每一行上进行计算,然后在其他行上进行计算?
这是我的桌子Python 如何在数据帧的行之间进行循环,并在每一行上进行计算,然后在其他行上进行计算?,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,这是我的桌子 School Year_16 ATAR_16 Year_17 ATAR_17 Year_18 ATAR_18 Year_19 ATAR_19 Year_20 ATAR_20 0 Perth Modern School 2016 95.55 2017 95.90 2018 97.00 2019 96.75 2020 97.55 1 Presbyterian Ladies' College 2016 92.90 2
School Year_16 ATAR_16 Year_17 ATAR_17 Year_18 ATAR_18 Year_19 ATAR_19 Year_20 ATAR_20
0 Perth Modern School 2016 95.55 2017 95.90 2018 97.00 2019 96.75 2020 97.55
1 Presbyterian Ladies' College 2016 92.90 2017 89.60 2018 86.90 2019 90.75 2020 89.20
2 Penrhos College 2016 92.65 2017 91.20 2018 88.15 2019 88.30 2020 90.65
3 Christ Church Grammar School 2016 92.50 2017 92.45 2018 91.60 2019 92.50 2020 92.50
4 Santa Maria College 2016 91.85 2017 89.90 2018 90.10 2019 87.45 2020 89.35
这是我的代码:
for i in df.index :
X_train = df['Year_16'][i], df['Year_17'][i], df['Year_18'][i], df['Year_19'][i]
X_train = list(X_train)
X_train=[[j] for j in X_train]
X_train = np.array(X_train)
X_test = [df['Year_20'][i]]
y_train = (df['ATAR_16'][i], df['ATAR_17'][i], df['ATAR_18'][i], df['ATAR_19'][i])
y_train = list(y_train)
y_train=[[j] for j in y_train]
y_train = np.array(y_train)
y_test = (df['ATAR_20'][i])
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
reg = regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
X_2020 = np.array(X_test)
X_2020 = X_2020.reshape(-1, 1)
y_pred = reg.predict(X_2020)
print(y_pred)
y_pred = y_pred[0]
predict = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
我想要一个表,每个学校的实际和预测,但这个代码给我的只是一个学校。
如何解决此问题?使用
predict
作为列表:
predict = []
for i in df.index
...
predict.append(pd.DataFrame(...))
predict = pd.concat(predict)
您的代码是什么样子的:
predict = []
for i in df.index :
X_train = df['Year_16'][i], df['Year_17'][i], df['Year_18'][i], df['Year_19'][i]
X_train = list(X_train)
X_train=[[j] for j in X_train]
X_train = np.array(X_train)
X_test = [df['Year_20'][i]]
y_train = (df['ATAR_16'][i], df['ATAR_17'][i], df['ATAR_18'][i], df['ATAR_19'][i])
y_train = list(y_train)
y_train=[[j] for j in y_train]
y_train = np.array(y_train)
y_test = (df['ATAR_20'][i])
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
reg = regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
X_2020 = np.array(X_test)
X_2020 = X_2020.reshape(-1, 1)
y_pred = reg.predict(X_2020)
print(y_pred)
y_pred = y_pred[0]
predict.append(pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}, index=[df['School'][i]]))
这就是你要找的吗
>>> pd.concat(predict)
Actual Predicted
Perth Modern School 97.55 97.475
Presbyterian Ladies' College 89.20 87.750
Penrhos College 90.65 86.050
Christ Church Grammar School 92.50 92.050
Santa Maria College 89.35 86.575