Python 使用ImageDataGenerator将Imagenet数据加载到多个GPU
我正在尝试从文件夹加载Imagenet数据集,以便在ResNet18模型上对其进行训练。由于Imagenet是一个大型数据集,所以我尝试将数据样本分布在多个GPU上。当我检查是否使用nvidia smi进行培训时,它表明培训已在所述GPU上开始。然而,各个时期的训练准确率并没有提高,而且损失似乎也没有减少。我怀疑这可能是因为我的x_-train,y_-train在GPU上分配时是如何加载的Python 使用ImageDataGenerator将Imagenet数据加载到多个GPU,python,tensorflow,deep-learning,imagenet,multiple-gpu,Python,Tensorflow,Deep Learning,Imagenet,Multiple Gpu,我正在尝试从文件夹加载Imagenet数据集,以便在ResNet18模型上对其进行训练。由于Imagenet是一个大型数据集,所以我尝试将数据样本分布在多个GPU上。当我检查是否使用nvidia smi进行培训时,它表明培训已在所述GPU上开始。然而,各个时期的训练准确率并没有提高,而且损失似乎也没有减少。我怀疑这可能是因为我的x_-train,y_-train在GPU上分配时是如何加载的 我想知道x_train,y_train=next(train_生成器)是否在每个历元中迭代所有数据集。如果
strategy = tensorflow.distribute.MirroredStrategy()
init_lr = 0.1
epochs =60
batch_size = 125
My_wd=0.0001
Loss = 'categorical_crossentropy'
Optimizer = SGD(lr=init_lr,decay=0.0005, momentum=0.9, nesterov=False)
def get_dataset():
train_data_dir = 'Datasets/Imagenet/ILSVRC2012_img_train/ILSVRC2012_img_train'
validation_data_dir = 'Datasets/Imagenet/ILSVRC2012_img_train/ILSVRC2012_img_train'
datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True, validation_split = 0.2)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = datagen.flow_from_directory(train_data_dir,target_size=(224,224),color_mode='rgb',batch_size= batch_size, subset= "training",class_mode='categorical', shuffle=True, seed=42)
val_generator = datagen.flow_from_directory(validation_data_dir,target_size=(224,224),color_mode='rgb',batch_size= batch_size, subset = "validation",class_mode='categorical', shuffle=True, seed=42)
x_train, y_train = next(train_generator)
x_val,y_val = next(val_generator)
return (
tensorflow.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(1024977).repeat().batch(global_batch_size),
tensorflow.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(256190).repeat().batch(global_batch_size),
)
train_generator, val_generator = get_dataset()
with strategy.scope():
model=resnet(input_shape=input_shape,num_classes=1000)
model.compile(loss=catcross_entropy_logits_loss() ,optimizer = Optimizer, metrics=['acc'])
model.summary()
history = model.fit(train_generator,
validation_data=val_generator,
epochs=epochs,
verbose=1,
use_multiprocessing=False,
workers=1,
callbacks=callbacks,
validation_steps=val_generator.n // batch_size,
steps_per_epoch =train_generator.n // batch_size)