在python中加载大量图像的有效方法_Python_Performance_Deep Learning_Google Colaboratory

在python中加载大量图像的有效方法

python performance deep-learning google-colaboratory

在python中加载大量图像的有效方法,python,performance,deep-learning,google-colaboratory,Python,Performance,Deep Learning,Google Colaboratory,在这里，我试图在谷歌Colab中加载火车图像。这里有10万张图片可在10个子文件夹中使用。这个电池运行了54分钟。这需要更多的时间。有什么有效的方法可以做到这一点吗？其中一种有效且有成效的方法是使用机器学习框架的数据加载器，如tensorflow、pytorch。根据您的代码，您同时加载所有图像，这需要很多时间。如果有很多图像，那么您可能会得到MemoryError。我强烈建议您在Tensorflow的PyTorch中使用DataLoader。数据加载程序在培训过程中加载批数据在Tensorf

在这里，我试图在谷歌Colab中加载火车图像。这里有10万张图片可在10个子文件夹中使用。这个电池运行了54分钟。这需要更多的时间。有什么有效的方法可以做到这一点吗？

其中一种有效且有成效的方法是使用机器学习框架的数据加载器，如tensorflow、pytorch。根据您的代码，您同时加载所有图像，这需要很多时间。如果有很多图像，那么您可能会得到

MemoryError

。我强烈建议您在Tensorflow的PyTorch中使用

DataLoader

。数据加载程序在培训过程中加载批数据

在Tensorflow中，可以使用以下结构：

tf.keras.preprocessing.image_dataset_from_directory(
    directory, labels='inferred', label_mode='int',
    class_names=None, color_mode='rgb', batch_size=32, image_size=(256,
    256), shuffle=True, seed=None, validation_split=None, subset=None,
    interpolation='bilinear', follow_links=False
)

在PyTorch中，但这里首先需要指定数据集，然后将其交给data loader：

imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)

由于效率的原因，上述数据加载方法被广泛使用。我希望使用它们可以帮助您完成任务。

为什么要在内存中加载这么多数据？为了训练深度学习模型，请使用数据生成器，而不是一次加载所有数据