Python 给定每个集合的图像文件名列表，是否将大型数据集拆分为训练/有效/测试目录？_Python_Machine Learning_Deep Learning_Pytorch_Data Science

Python 给定每个集合的图像文件名列表，是否将大型数据集拆分为训练/有效/测试目录？

python machine-learning deep-learning pytorch

Python 给定每个集合的图像文件名列表，是否将大型数据集拆分为训练/有效/测试目录？,python,machine-learning,deep-learning,pytorch,data-science,Python,Machine Learning,Deep Learning,Pytorch,Data Science,我试图将一个大型数据集从数据集中拆分为训练集/有效集/测试集，以进行图像分类数据集的结构是这样的，所有的图像都在一个文件夹中 '', 'Structure:', '----------', 'pec/', ' images/', ' <class_name>/', ' <image_id>.jpg', ' meta/', ' classes.txt', ' labels.txt', '

我试图将一个大型数据集从数据集中拆分为训练集/有效集/测试集，以进行图像分类

数据集的结构是这样的，所有的图像都在一个文件夹中

'',
'Structure:',
'----------',
'pec/',
'    images/',
'        <class_name>/',
'            <image_id>.jpg',
'    meta/',
'        classes.txt',
'        labels.txt',
'        test.json',
'        test.txt',
'        train.json',
'        train.txt',
'',
'All images can be found in the "images" folder and are organized per class. All',
'image ids are unique and correspond to the foodspotting.com review ids. 
'',
'The test/train splitting used in the experiment of our paper can be found in',
'the "meta" directory.', (edited) ```



I want to divide images dataset to train/valid/test  with the list of filenames given in train.txt and test.txt, which author used

通过在列表中获取文件名来运行嵌套循环来单独移动图像，创建文件夹要花费很长时间，因为总共有100100个图像

我有一个train/valid和test set的文件名列表，但如何将它们放入文件夹中，以便我们可以将其以pytorch图像文件夹格式提供给图像分类器（我的意思是train/valid/test set是三个不同的文件夹，每个文件夹都有每个类的子文件夹）

如果有人知道怎么做，请告诉我，我真的需要你的帮助，谢谢：微笑：

看来我在解决方案上出了错，我不需要移动图像。我需要更改的是通过操作系统模块以所需格式访问图像的路径

下面是执行此操作的代码。假设您在有效列表中有文件名列表

#for valid set 

v = valid.reshape(15150,)

or_fpath = '/content/food-101/images/' #path of original folder
cp_fpath = '/content/food101/valid/'   #path of destination folder

for y in tqdm(v):

 foldername = y.split('/')[0]

 img = y.split('/')[1] +'.jpg'

 ip_path = or_fpath+foldername
 op_path = cp_fpath+foldername

 if not os.path.exists(op_path):
   os.mkdir(op_path)

 os.rename(os.path.join(ip_path, img), os.path.join(op_path, img))

谢谢

注意：如果您有更好的答案，请与我们分享，谢谢

看来我在解决方案上完全错了，我不需要移动图像。我需要更改的是通过操作系统模块以所需格式访问图像的路径

下面是执行此操作的代码。假设您在有效列表中有文件名列表

#for valid set 

v = valid.reshape(15150,)

or_fpath = '/content/food-101/images/' #path of original folder
cp_fpath = '/content/food101/valid/'   #path of destination folder

for y in tqdm(v):

 foldername = y.split('/')[0]

 img = y.split('/')[1] +'.jpg'

 ip_path = or_fpath+foldername
 op_path = cp_fpath+foldername

 if not os.path.exists(op_path):
   os.mkdir(op_path)

 os.rename(os.path.join(ip_path, img), os.path.join(op_path, img))

谢谢

注意：如果您有更好的答案，请分享感谢

也许您不需要实际移动图像。在培训阶段，将图像名称写入3个不同的文件，并首先检索图像文件名称，然后检索图像文件本身，如何？Hi@lincr，感谢您的回复，我在3个不同的文件中有文件名称，但是，首先检索图像文件名，然后在培训阶段检索图像文件本身意味着从头开始构建数据加载器，我正在使用fastai library的基本级和高级函数中的pytorch代码，并且要使数据加载器与这两个函数匹配将是非常困难和低效的，所以我希望它们在文件夹中，也许你不需要移动图像。在培训阶段，将图像名称写入3个不同的文件，并首先检索图像文件名称，然后检索图像文件本身，如何？Hi@lincr，感谢您的回复，我在3个不同的文件中有文件名称，但是，首先检索图像文件名，然后在培训阶段检索图像文件本身意味着从头开始构建数据加载器，我正在使用fastai library的基本级和高级函数中的pytorch代码，并且要使数据加载器与这两个函数匹配将是非常困难和低效的，所以我希望它们是文件夹格式