Scikit learn 向scikit学习中的分类任务添加协变量_Scikit Learn

Scikit learn 向scikit学习中的分类任务添加协变量

scikit-learn

Scikit learn 向scikit学习中的分类任务添加协变量,scikit-learn,Scikit Learn,在我的项目中，我想构建一个分类器，根据结构MRI数据的体素值特征集预测我的受试者类别（患者与健康对照组）。我使用sklearn.linear\u model.LogisticRegression作为分类器。由于年龄和性别对sMRI数据中的体素强度有影响，我想将它们作为协变量包含在我的分类任务中。我如何在scikit学习中做到这一点？我是否只是将它们添加到我的功能集中？若有，我如何处理不同尺度的协变量（年龄是连续的，性别是分类的）下面是一个简单的虚拟示例： import numpy as np

在我的项目中，我想构建一个分类器，根据结构MRI数据的体素值特征集预测我的受试者类别（患者与健康对照组）。我使用

sklearn.linear\u model.LogisticRegression

作为分类器。由于年龄和性别对sMRI数据中的体素强度有影响，我想将它们作为协变量包含在我的分类任务中。我如何在scikit学习中做到这一点？我是否只是将它们添加到我的功能集中？若有，我如何处理不同尺度的协变量（年龄是连续的，性别是分类的）

下面是一个简单的虚拟示例：

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

rng = np.random.RandomState(42)

# dummy feature set (columns represent voxels)
X = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])

# dummy labels (1 = patients, 0= healthy controls)
y = np.array([1,0,1,0])

# dummy covariates (age and gender) - These should be included in my classification task
age = np.array([18,25,31,55])
gender = np.array([1,1,0,0])

# z-standardize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# classification task
lr = LogisticRegression(random_state=rng)
lr.fit(X, y)
predictions = lr.predict(X)

这篇文章可能与一个

有关，对于我的神经成像预测模型，我通常构建两个模型。一个包含感兴趣的数据，另一个包含年龄等。如果绩效没有显著变化，则年龄等不会影响数据的预测能力

当然，对于这类问题，您应该使用交叉验证方案

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

rng = np.random.RandomState(42)

# dummy feature set (columns represent voxels)
X = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])

# dummy labels (1 = patients, 0= healthy controls)
y = np.array([1,0,1,0])

# dummy covariates (age and gender) - These should be included in my classification task
age = np.array([18,25,31,55])
gender = np.array([1,1,0,0])

Xfull = np.concatenate([X,age.reshape(-1,1),gender.reshape(-1,1)], axis = 1)

# z-standardize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# z-standardize features with covariates 
scaler2 = StandardScaler()
Xfull = scaler2.fit_transform(Xfull)


# classification task - model 1
lr1 = LogisticRegression(random_state=rng)
lr1.fit(X, y)
print("Score using only voxel data: {}".format(lr.score(X,y)))

# classification task - model 2
lr2 = LogisticRegression(random_state=rng)
lr2.fit(Xfull, y)
print("Score using voxel data & covariates: {}".format(lr2.score(Xfull,y)))

谢谢你的回答。这意味着，我只是将它们添加到我的功能集中。我不必将年龄和性别与体素值区别对待，尽管性别将是我的数据集中唯一的分类特征？我认为当我标准化数据时，我必须以不同的方式对待性别。在任何情况下，相同的标准应该应用于所有特征。在此之前，您可能希望首先转换分类变量