Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/286.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在第14纪元在Imagenet上训练Resnet50时出错_Python_Pytorch_Imagenet_Pytorch Dataloader - Fatal编程技术网

Python 在第14纪元在Imagenet上训练Resnet50时出错

Python 在第14纪元在Imagenet上训练Resnet50时出错,python,pytorch,imagenet,pytorch-dataloader,Python,Pytorch,Imagenet,Pytorch Dataloader,我正在使用PyTorch提供的脚本在imagenet上训练Resnet50(出于我的目的,做了一点小小的调整)。然而,经过14个阶段的训练后,我发现了以下错误。我在服务器上分配了4个GPU来运行这个。任何关于此错误的指示都将不胜感激。非常感谢 Epoch: [14][5000/5005] Time 1.910 (2.018) Data 0.000 (0.191) Loss 2.6954 (2.7783) Total 2.6954 (2.7783) Reg 0.0000 Prec

我正在使用PyTorch提供的脚本在imagenet上训练Resnet50(出于我的目的,做了一点小小的调整)。然而,经过14个阶段的训练后,我发现了以下错误。我在服务器上分配了4个GPU来运行这个。任何关于此错误的指示都将不胜感激。非常感谢

Epoch: [14][5000/5005]  Time 1.910 (2.018)  Data 0.000 (0.191)  Loss 2.6954 (2.7783)    Total 2.6954 (2.7783)   Reg 0.0000  Prec@1 42.969 (40.556)  Prec@5 64.844 (65.368)   
Test: [0/196]   Time 86.722 (86.722)    Loss 1.9551 (1.9551)    Prec@1 51.562 (51.562)  Prec@5 81.641 (81.641)
Traceback (most recent call last):
  File "main_group.py", line 549, in <module>
  File "main_group.py", line 256, in main
    
  File "main_group.py", line 466, in validate
    if args.gpu is not None:
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 801, in __next__
    return self._process_data(data)
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
    raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 11.
Original Traceback (most recent call last):
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 138, in __getitem__
    sample = self.loader(path)
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 174, in default_loader
    return pil_loader(path)
  File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 155, in pil_loader
    with open(path, 'rb') as f:
OSError: [Errno 5] Input/output error: '/data/users2/oiler/github/imagenet-data/val/n02102973/ILSVRC2012_val_00009130.JPEG'
Epoch:[14][5000/5005]时间1.910(2.018)数据0.000(0.191)损失2.6954(2.7783)总计2.6954(2.7783)注册0.0000Prec@1 42.969 (40.556)  Prec@5 64.844 (65.368)   
测试:[0/196]时间86.722(86.722)损失1.9551(1.9551)Prec@1 51.562 (51.562)  Prec@5 81.641 (81.641)
回溯(最近一次呼叫最后一次):
文件“main_group.py”,第549行,在
文件“main_group.py”,第256行,在main中
文件“main_group.py”,第466行,在validate中
如果args.gpu不是无:
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torch/utils/data/dataloader.py”,第801行,下一页__
返回自处理数据(数据)
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torch/utils/data/dataloader.py”,第846行,进程内数据
data.reraise()
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/_-utils.py”,第385行,在reraise中
提升自我执行类型(msg)
OSError:在DataLoader工作进程11中捕获到OSError。
原始回溯(最近一次呼叫最后一次):
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torch/utils/data/_-utils/worker.py”,第178行,在“worker”循环中
data=fetcher.fetch(索引)
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torch/utils/data/_-utils/fetch.py”,第44行,fetch中
data=[self.dataset[idx]用于可能的批处理索引中的idx]
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torch/utils/data/_-utils/fetch.py”,第44行,在
data=[self.dataset[idx]用于可能的批处理索引中的idx]
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torchvision/datasets/folder.py”,第138行,在__
sample=self.loader(路径)
默认加载程序中的文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torchvision/datasets/folder.py”,第174行
返回pil_装载机(路径)
文件“/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site packages/torchvision/datasets/folder.py”,pil_loader中第155行
打开(路径“rb”)作为f:
OSError:[Errno 5]输入/输出错误:'/data/users2/oiler/github/imagenet data/val/n02102973/ILSVRC2012_val_00009130.JPEG'

仅仅通过查看您发布的错误,很难判断问题出在哪里

我们所知道的是,读取
'/data/users2/oiler/github/imagenet data/val/n02102973/ILSVRC2012_val_00009130.JPEG'
上的文件时出现问题

请尝试以下操作:

  • 确认文件实际存在
  • 确认它实际上是一个有效的JPEG并且没有损坏(通过查看它)
  • 确认您可以使用Python打开它,也可以使用PIL手动加载它
  • 如果这些都不起作用,请尝试删除该文件。在文件夹中的另一个文件上是否出现相同的错误

  • 谢谢你的回答。我回去按你建议的步骤做了。文件实际存在,可以查看,PIL可以手动加载。此外,由于已经执行了14个阶段的培训,我假设该文件在成功加载之前已经加载了14次。你认为错误的背后可能还有其他原因吗?