Python 3.x 多类分类的logistic回归实现结果的差异_Python 3.x_Scikit Learn_Logistic Regression_Multiclass Classification

Python 3.x 多类分类的logistic回归实现结果的差异

python-3.x scikit-learn

Python 3.x 多类分类的logistic回归实现结果的差异,python-3.x,scikit-learn,logistic-regression,multiclass-classification,Python 3.x,Scikit Learn,Logistic Regression,Multiclass Classification,我有一组数据，包括在24小时内12个不同时间点对3个相同细菌样本的基因表达进行测量。我正试图通过使用python的逻辑回归，根据每个时间点的表达值找到最重要的基因首先，我尝试手动应用one vs rest技术，将我感兴趣的时间点的值1指定给输出数组，并将0指定给其余的时间点以下是我使用的代码： from sklearn.linear_model import LogisticRegression from numpy.random import seed seed(1) X = gene

我有一组数据，包括在24小时内12个不同时间点对3个相同细菌样本的基因表达进行测量。我正试图通过使用python的逻辑回归，根据每个时间点的表达值找到最重要的基因

首先，我尝试手动应用one vs rest技术，将我感兴趣的时间点的值1指定给输出数组，并将0指定给其余的时间点

以下是我使用的代码：

from sklearn.linear_model import LogisticRegression
from numpy.random import seed
seed(1)

X =  genes_data.iloc[:,2:].T
log = LogisticRegression(penalty='l1', solver='liblinear', C=0.24)
for i in range(12):
    y = [(1 if j%12 == i else 0) for j in range(36)]
    model = log.fit(X, y)

    top_10_idx = np.argsort(model.coef_[0])[-10:]
    top_10_values = [model.coef_[0][i] for i in top_10_idx]
    top_10_genes = [genes_list["Name"][i] for i in top_10_idx]
    
    print("The top 10 significant genes in {}h are:".format(TIME_POINTS[i]))
    print(top_10_genes)
    print("The number of nonzero genes is {}\n".format(len(list(filter(lambda x: x<0,model.coef_[0])))))

来自sklearn.linear\u模型导入逻辑回归
从numpy.random导入种子
种子（1）
X=基因\数据.iloc[：，2::.T
log=logistic回归（惩罚=l1'，解算器=liblinear'，C=0.24）
对于范围（12）内的i：
y=[（如果j%12==i else 0，则为1）对于范围（36）内的j]
模型=对数拟合（X，y）
top_10_idx=np.argsort（model.coef_u0]）[-10:]
top_10_值=[model.coef_[0][i]表示top_10_idx中的i]
top_10_genes=[genes_list[“Name”][i]代表top_10_idx中的i]
print（{}h中最重要的10个基因是：“.format（TIME_POINTS[i]））
打印（前10个基因）
print（“非零基因的数量为{}\n”。格式（len（list）（filter）（lambda x:x
X =  genes_data.iloc[:,2:].T

y = [2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 15, 24, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12,
     15, 24, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 15, 24]

log = LogisticRegression(penalty='l1', solver='liblinear', multi_class="ovr", C=0.24)
model = log.fit(X, y)

for j in range(12):
    top_10_idx = np.argsort(model.coef_[j])[-10:]
    top_10_values = [model.coef_[j][i] for i in top_10_idx]
    top_10_genes = [genes_list["Name"][i] for i in top_10_idx]
    
    print("The top 10 significant genes in {}h are:".format(TIME_POINTS[j]))
    print(top_10_genes)
    print("The number of nonzero genes is {}\n".format(len(list(filter(lambda x: x<0,model.coef_[j])))))