Python 3.x 多类分类的logistic回归实现结果的差异

Python 3.x 多类分类的logistic回归实现结果的差异,python-3.x,scikit-learn,logistic-regression,multiclass-classification,Python 3.x,Scikit Learn,Logistic Regression,Multiclass Classification,我有一组数据,包括在24小时内12个不同时间点对3个相同细菌样本的基因表达进行测量。我正试图通过使用python的逻辑回归,根据每个时间点的表达值找到最重要的基因 首先,我尝试手动应用one vs rest技术,将我感兴趣的时间点的值1指定给输出数组,并将0指定给其余的时间点 以下是我使用的代码: from sklearn.linear_model import LogisticRegression from numpy.random import seed seed(1) X = gene

我有一组数据,包括在24小时内12个不同时间点对3个相同细菌样本的基因表达进行测量。我正试图通过使用python的逻辑回归,根据每个时间点的表达值找到最重要的基因

首先,我尝试手动应用one vs rest技术,将我感兴趣的时间点的值1指定给输出数组,并将0指定给其余的时间点

以下是我使用的代码:

from sklearn.linear_model import LogisticRegression
from numpy.random import seed
seed(1)

X =  genes_data.iloc[:,2:].T
log = LogisticRegression(penalty='l1', solver='liblinear', C=0.24)
for i in range(12):
    y = [(1 if j%12 == i else 0) for j in range(36)]
    model = log.fit(X, y)

    top_10_idx = np.argsort(model.coef_[0])[-10:]
    top_10_values = [model.coef_[0][i] for i in top_10_idx]
    top_10_genes = [genes_list["Name"][i] for i in top_10_idx]
    
    print("The top 10 significant genes in {}h are:".format(TIME_POINTS[i]))
    print(top_10_genes)
    print("The number of nonzero genes is {}\n".format(len(list(filter(lambda x: x<0,model.coef_[0])))))
来自sklearn.linear\u模型导入逻辑回归
从numpy.random导入种子
种子(1)
X=基因\数据.iloc[:,2::.T
log=logistic回归(惩罚=l1',解算器=liblinear',C=0.24)
对于范围(12)内的i:
y=[(如果j%12==i else 0,则为1)对于范围(36)内的j]
模型=对数拟合(X,y)
top_10_idx=np.argsort(model.coef_u0])[-10:]
top_10_值=[model.coef_[0][i]表示top_10_idx中的i]
top_10_genes=[genes_list[“Name”][i]代表top_10_idx中的i]
print({}h中最重要的10个基因是:“.format(TIME_POINTS[i]))
打印(前10个基因)
print(“非零基因的数量为{}\n”。格式(len(list)(filter)(lambda x:x
X =  genes_data.iloc[:,2:].T

y = [2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 15, 24, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12,
     15, 24, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 15, 24]

log = LogisticRegression(penalty='l1', solver='liblinear', multi_class="ovr", C=0.24)
model = log.fit(X, y)

for j in range(12):
    top_10_idx = np.argsort(model.coef_[j])[-10:]
    top_10_values = [model.coef_[j][i] for i in top_10_idx]
    top_10_genes = [genes_list["Name"][i] for i in top_10_idx]
    
    print("The top 10 significant genes in {}h are:".format(TIME_POINTS[j]))
    print(top_10_genes)
    print("The number of nonzero genes is {}\n".format(len(list(filter(lambda x: x<0,model.coef_[j])))))