Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我们如何在列的每个值中对数据帧进行子采样_Python_Pandas_Scikit Learn_Cross Validation - Fatal编程技术网

Python 我们如何在列的每个值中对数据帧进行子采样

Python 我们如何在列的每个值中对数据帧进行子采样,python,pandas,scikit-learn,cross-validation,Python,Pandas,Scikit Learn,Cross Validation,我有一个数据框,其中有一列给出了集群,我想在每个集群中使用相同的分数执行kfold并进行测试 我知道我可以用以下代码自己做: nb_fold = 10 for i in range(nb_fold): X_train= X.groupby('Cluster').apply(lambda x: x.sample(frac = 1/nb_fold)) X_train.index = temp.index.droplevel(0) Y_train = Y.loc[X_tra

我有一个数据框,其中有一列给出了集群,我想在每个集群中使用相同的分数执行kfold并进行测试

我知道我可以用以下代码自己做:

nb_fold = 10
for i in range(nb_fold):

    X_train= X.groupby('Cluster').apply(lambda x: x.sample(frac = 1/nb_fold))
    X_train.index = temp.index.droplevel(0)

    Y_train = Y.loc[X_train.index]

    X_eval, Y_eval = X.drop(X_train.index), Y.drop(Y_train.index)
但我想知道是否有一个scikit学习包装器,因为这是一个带替换的绘图,我可以使用一个不带替换的绘图。

看起来你需要它。在分类任务中,它通常有助于保持类在折叠中的分布相同。但您可以在簇标签上分层以达到所需的效果

from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=10)
for train_ind, eval_ind in skf.split(X, X['Cluster']):
    X_train, Y_train = X.iloc[train_ind, :], Y.iloc[train_ind]
    X_eval, Y_eval = X.iloc[eval_ind, :], Y.iloc[eval_ind]