Python 从sklearn跨多个测试列车拆分整理模型系数_Python_Pandas_Dataframe_Sklearn Pandas

Python 从sklearn跨多个测试列车拆分整理模型系数

python pandas dataframe

Python 从sklearn跨多个测试列车拆分整理模型系数,python,pandas,dataframe,sklearn-pandas,Python,Pandas,Dataframe,Sklearn Pandas,我想用python将来自多个（随机）测试序列分割的模型/特征系数组合成一个数据帧目前，我的方法是为每个测试序列生成模型系数，一次拆分一个，然后在代码末尾合并它们虽然这是可行的，但它过于冗长，无法扩展到大量测试列车拆分也许有人能用一个简单的for循环来简化我的方法吗？我的不雅、过于冗长的代码如下： from sklearn import datasets from sklearn.linear_model import LogisticRegression from sklearn.mode

我想用python将来自多个（随机）测试序列分割的模型/特征系数组合成一个数据帧

目前，我的方法是为每个测试序列生成模型系数，一次拆分一个，然后在代码末尾合并它们

虽然这是可行的，但它过于冗长，无法扩展到大量测试列车拆分

也许有人能用一个简单的for循环来简化我的方法吗？我的不雅、过于冗长的代码如下：

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


####Instantiate logistic regression objects
log = LogisticRegression(class_weight='balanced', random_state = 1)

#### import some data 
iris = datasets.load_iris()

X = pd.DataFrame(iris.data[:100, :], columns = ["sepal_length", "sepal_width", "petal_length", "petal_width"])
y = iris.target[:100,]

#####test_train split #1
train_x, test_x, train_y, test_y = train_test_split(X,y, stratify=y, test_size=0.3, random_state=11)
log.fit(train_x, train_y) #fit final model 

pred_y = log.predict(test_x) #store final model predictions 
probs_y = log.predict_proba(test_x) #final model class probabilities

coeff_final1 = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(log.coef_))], axis = 1)
coeff_final1.columns=("features", "coefficients_1")

######test_train split #2
train_x, test_x, train_y, test_y = train_test_split(X,y, stratify=y, test_size=0.3, random_state=444)
log.fit(train_x, train_y) #fit final model 

pred_y = log.predict(test_x) #store final model predictions 
probs_y = log.predict_proba(test_x) #final model class probabilities

coeff_final2 = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(log.coef_))], axis = 1)
coeff_final2.columns=("features", "coefficients_2")

#####test_train split #3
train_x, test_x, train_y, test_y = train_test_split(X,y, stratify=y, test_size=0.3, random_state=21)
log.fit(train_x, train_y) #fit final model 

pred_y = log.predict(test_x) #store final model predictions 
probs_y = log.predict_proba(test_x) #final model class probabilities

coeff_final3 = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(log.coef_))], axis = 1)
coeff_final3.columns=("features", "coefficients_3")

#####test_train split #4
train_x, test_x, train_y, test_y = train_test_split(X,y, stratify=y, test_size=0.3, random_state=109)
log.fit(train_x, train_y) #fit final model 

pred_y = log.predict(test_x) #store final model predictions 
probs_y = log.predict_proba(test_x) #final model class probabilities

coeff_final4 = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(log.coef_))], axis = 1)
coeff_final4.columns=("features", "coefficients_4")

#####test_train split #5
train_x, test_x, train_y, test_y = train_test_split(X,y, stratify=y, test_size=0.3, random_state=1900)
log.fit(train_x, train_y) #fit final model 

pred_y = log.predict(test_x) #store final model predictions 
probs_y = log.predict_proba(test_x) #final model class probabilities

coeff_final5 = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(log.coef_))], axis = 1)
coeff_final5.columns=("features", "coefficients_5")

#######Append features/coefficients & odds ratios across 5 test-train splits

#append all coefficients into a single dataframe
coeff_table = pd.concat([coeff_final1, coeff_final2["coefficients_2"], coeff_final3["coefficients_3"],coeff_final4["coefficients_4"], coeff_final5["coefficients_5"] ], axis = 1)

#append mean and std error for each coefficient
coeff_table["mean_coeff"] = coeff_table.mean(axis = 1)

coeff_table["se_coeff"] = coeff_table[["features", "coefficients_1", "coefficients_2", "coefficients_3", "coefficients_4", "coefficients_5"]].sem(axis=1)

最后一张表如下所示：

有人能告诉我如何生成上面的表，而不必编写上面从测试列拆分#2到测试列拆分#5的所有代码行吗

如您所述，您可以使用for循环执行此操作：

#首先创建第一个功能列
coeff_table=pd.DataFrame（X.columns，columns=[“features”]）
#在跟踪'i'的同时迭代随机状态`
对于i，在enumerate（[11，444，21，109，1900]）中声明：
列车x、试验列车x、列车y、试验列车y=列车试验列车分离(
十、 y，分层=y，测试大小=0.3，随机状态=状态）
对数拟合（x列、y列）#拟合最终模型
coeff_table[f“coefficients_{i+1}”]=np.transpose（log.coef_）

请注意，我们正在删除此循环中的

predict

和

predict\u proba

调用，因为这些值正在被丢弃（每次在代码中都会被覆盖），但是您可以使用循环中类似的逻辑将它们添加回，以便在表中创建新列。

如您所述，您可以使用for循环执行此操作：

#首先创建第一个功能列
coeff_table=pd.DataFrame（X.columns，columns=[“features”]）
#在跟踪'i'的同时迭代随机状态`
对于i，在enumerate（[11，444，21，109，1900]）中声明：
列车x、试验列车x、列车y、试验列车y=列车试验列车分离(
十、 y，分层=y，测试大小=0.3，随机状态=状态）
对数拟合（x列、y列）#拟合最终模型
coeff_table[f“coefficients_{i+1}”]=np.transpose（log.coef_）

请注意，我们正在删除此循环中的

predict

和

predict\u proba

调用，因为这些值正在被丢弃（每次在代码中都会被覆盖），但是您可以在循环中使用类似的逻辑将它们添加回，以在表中创建新列。

这有许多步骤。你试过了吗？你被什么绊倒了？通常，您可以使用for循环，然后将结果插入循环中的数据帧中。如果每个元素的参数（如test_size或random_state）不同，则可以设置字典或列表，并在迭代时访问元素。你有什么特别需要解决的问题吗？整个测试的大小都是一样的，但随机状态会有所不同。我在为每次迭代添加系数时遇到了麻烦。这有很多步骤。你试过了吗？你被什么绊倒了？通常，您可以使用for循环，然后将结果插入循环中的数据帧中。如果每个元素的参数（如test_size或random_state）不同，则可以设置字典或列表，并在迭代时访问元素。你有什么特别需要解决的问题吗？整个测试的大小都是一样的，但随机状态会有所不同。我在为每次迭代添加系数时遇到了麻烦。您的代码需要一点时间才能完全遵循，但工作得很好！如果您能在for循环中添加枚举部分的解释，我们将不胜感激。同样感谢您指出并删除了无关的predict/predict_proba代码。请查看其中非常好地总结了它并提供了示例：）您的代码需要一点时间才能完全理解，但工作非常出色！如果您能在for循环中添加枚举部分的解释，我们将不胜感激。同样感谢您指出并删除了无关的predict/predict_proba代码。请查看哪些代码总结得非常好，并提供了示例：）