Python 使用scikit learn和pandas编写结果表_Python_Pandas_Machine Learning_Scikit Learn_Benchmarking

Python 使用scikit learn和pandas编写结果表

python pandas machine-learning scikit-learn

Python 使用scikit learn和pandas编写结果表,python,pandas,machine-learning,scikit-learn,benchmarking,Python,Pandas,Machine Learning,Scikit Learn,Benchmarking,我想用几个数据集对scikit learn中的分类器进行基准测试。对于每个数据集，这将涉及在四个分类器设置上运行网格搜索，并生成一个表，记录样本外测试集上的精度、召回率、精度和f1分数。（此表理想情况下是一个数据帧）考虑到这个过程需要一些时间，而且每个数据集都是独立的，我想知道如何生成这些结果，以便在过程中断时，来自以前数据集的结果仍将写入文件在scikit learn和pandas框架内提供这些“实时更新”的标准方式是什么下面是一些代码，显示了为每个数据集生成的结果类型：您可以在每次迭

我想用几个数据集对scikit learn中的分类器进行基准测试。对于每个数据集，这将涉及在四个分类器设置上运行网格搜索，并生成一个表，记录样本外测试集上的精度、召回率、精度和f1分数。（此表理想情况下是一个数据帧）

考虑到这个过程需要一些时间，而且每个数据集都是独立的，我想知道如何生成这些结果，以便在过程中断时，来自以前数据集的结果仍将写入文件

在scikit learn和pandas框架内提供这些“实时更新”的标准方式是什么

下面是一些代码，显示了为每个数据集生成的结果类型：

您可以在每次迭代中生成一个

数据帧

，并将其作为CSV保存到磁盘，而不是在实验循环中创建一个

结果表

for ind in range(len(self.dataset_names)):
    # execute your experiments and save results

    df = pd.DataFrame({
        'best_score': [best_seq.score(X_test, y_test)], 
        'duration': [seq_tic-seq_toc], 
        'train_samples': [X_train.shape[0]], 
        'test_shape': [X_test.shape[0]], 
        'train_time_points': [X_train.shape[1]]
    })

    df.to_csv('%s_results.csv' % self.dataset_names[ind])

在为数据集计算结果之后，您可能只需将结果pickle到一个文件中。因此，您可以首先检查相应的pickle文件是否已经存在，如果已经存在，则加载它，如果不存在，则计算值并对其进行pickle。

for ind in range(len(self.dataset_names)):
    # execute your experiments and save results

    df = pd.DataFrame({
        'best_score': [best_seq.score(X_test, y_test)], 
        'duration': [seq_tic-seq_toc], 
        'train_samples': [X_train.shape[0]], 
        'test_shape': [X_test.shape[0]], 
        'train_time_points': [X_train.shape[1]]
    })

    df.to_csv('%s_results.csv' % self.dataset_names[ind])