Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从数据集中分离列车、测试、有效数据并将其存储在pickle中_Python_Jupyter Notebook_Jupyter - Fatal编程技术网

Python 如何从数据集中分离列车、测试、有效数据并将其存储在pickle中

Python 如何从数据集中分离列车、测试、有效数据并将其存储在pickle中,python,jupyter-notebook,jupyter,Python,Jupyter Notebook,Jupyter,目前,我的数据集包含161个文件夹,每个文件夹中包含500个数据(.img)。总计=80500个数据 有我可以更改的代码吗?当前卡在拆分为列车/有效/测试并保存的过程中 下面显示了加载my 161 folders数据集的代码 import os import numpy as np import cv2 import glob folders = glob.glob('C:/Users/Pc/Desktop/datasets/*') imagenames_list = [] for fo

目前,我的数据集包含161个文件夹,每个文件夹中包含500个数据(.img)。总计=80500个数据 有我可以更改的代码吗?当前卡在拆分为列车/有效/测试并保存的过程中

下面显示了加载my 161 folders数据集的代码

   import os
import numpy as np
import cv2
import glob
folders = glob.glob('C:/Users/Pc/Desktop/datasets/*')
imagenames_list = []

for folder in folders:
    for f in glob.glob(folder+'/*.jpg'):
        imagenames_list.append(f)
        
read_images = []
for image in imagenames_list:
    read_images.append(cv2.imread(image, cv2.IMREAD_GRAYSCALE))
    
images = np.array(read_images)
下面的代码显示了我如何将数据拆分为60%训练/20%测试/20%有效。 我是否继续进行正确的培训/测试/有效培训,并能够链接到我的数据集?如何将它们存储到pickle文件中

from sklearn.model_selection import train_test_split

X, y = np.random.random((80500,10)), np.random.random((80500,))

p = 0.2
new_p = (p*y.shape[0])/((1-p)*y.shape[0])

X, X_val, y, y_val = train_test_split(X, y, test_size=p)
X_train, X_test, y, y_test = train_test_split(X, y, test_size=new_p)

print([i.shape for i in [X_train, X_test, X_val]])

您可以将它们存储在pickle文件中,如下所示:

import pickle

dataset_dict = {"X_train": X_train, "X_test": X_test, "X_val": X_val, "y_train": y_train, "y_test": y_test, "y_val": y_val}

with open('dataset_dict.pickle', 'wb') as file:
    pickle.dump(dataset_dict, file)
然后像这样把它们装回去

with open('dataset_dict.pickle', 'rb') as file:
    dataset_dict = pickle.load(file)