Python 如何使用SKlearn使用Leave-one-Out方法预测多列Y?
我有一个示例数据框,如下所示。Y列都包含0,1个二进制结果。X是从X_1到X_13开始的列Python 如何使用SKlearn使用Leave-one-Out方法预测多列Y?,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我有一个示例数据框,如下所示。Y列都包含0,1个二进制结果。X是从X_1到X_13开始的列 x_1 x_2 ... x_13 y_1 y_2 y_3 ... y_48 1 0.1 0.2 .... 0.1 0 1 0 .... 0 2 0.5 0.2 .... 0.2 1 0 1 .... 1 ... 100 0.1 0.0 .... 0.5 0 1 0 ....0 我不熟悉机器学
x_1 x_2 ... x_13 y_1 y_2 y_3 ... y_48
1 0.1 0.2 .... 0.1 0 1 0 .... 0
2 0.5 0.2 .... 0.2 1 0 1 .... 1
...
100 0.1 0.0 .... 0.5 0 1 0 ....0
我不熟悉机器学习方法。我打算用漏掉一个的方法来计算F1的分数。如果不使用Leave one out,我们可以使用以下代码:
accs = []
for i in range(48):
Y = df['y_{}'.format(i+1)]
model = RandomForest()
model.fit(X, Y)
predicts = model.predict(X)
accs.append(f1(predicts,Y))
print(accs)
结果打印出[1,1,1…1]。我如何合并一个遗漏方法以确保我们只打印出F1平均分数,如0.45?示例数据集:
import pandas as pd
import numpy as np
np.random.seed(111)
df = pd.concat([
pd.DataFrame(np.random.uniform(0,1,(100,10)),
columns = ["x_" + str(i) for i in np.arange(1,11)]),
pd.DataFrame(np.random.binomial(1,0.5,(100,5)),
columns = ["y_" + str(i) for i in np.arange(1,6)])
],axis=1)
X = df.filter(like="x_")
然后,您可以使用cross\u val\u predict
和KFold
来获得每次折叠的预测。将分割数设置为与观察数相同的数目:
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
accs = []
result = []
loocv = KFold(len(X))
for i in range(5):
Y = df['y_{}'.format(i+1)]
model = RandomForestClassifier()
fold_pred = cross_val_predict(model, X, Y, cv=loocv)
result.append(f1_score(Y,predicts))
model.fit(X, Y)
predicts = model.predict(X)
accs.append(f1_score(Y,predicts))
print(result)
[0.5, 0.5871559633027522, 0.5585585585585585, 0.5585585585585585, 0.5871559633027522]
你能简单地说明什么是X吗?是不是所有的变量都有x_?是的,没错。X是以X_1开始并以X_13结束的所有列。