DataLoader RuntimeError Ran out of memory

雨点打透心脏的1/2处 2022-04-10 14:15 314阅读 0赞

错误

  1. Traceback (most recent call last):
  2. File "train.py", line 137, in <module>
  3. train(model, device,criterion, trainLoader, optimizer, epoch,losses)
  4. File "train.py", line 33, in train
  5. for batchIdx, (data, target) in enumerate(trainLoader):
  6. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
  7. __mp_main__
  8. return _DataLoaderIter(self)
  9. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
  10. w.start()
  11. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\multiprocessing\process.py", line 105, in start
  12. self._popen = self._Popen(self)
  13. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
  14. return _default_context.get_context().Process._Popen(process_obj)
  15. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
  16. return Popen(process_obj)
  17. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
  18. reduction.dump(process_obj, to_child)
  19. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
  20. ForkingPickler(file, protocol).dump(obj)
  21. OSError: [Errno 22] Invalid argument
  22. C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  23. from ._conv import register_converters as _register_converters
  24. Traceback (most recent call last):
  25. File "<string>", line 1, in <module>
  26. File "C:\Users\user_name\AppData\Local\Continuum\anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
  27. exitcode = _main(fd)
  28. File "C:\Users\user_namer\AppData\Local\Continuum\anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
  29. self = reduction.pickle.load(from_parent)
  30. EOFError: Ran out of memory
  31. RuntimeError:
  32. An attempt has been made to start a new process before the
  33. current process has finished its bootstrapping phase.
  34. The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.

问题代码段

  1. train_loader = Data.DataLoader(dataset=train_data,batch_size=1, shuffle=True, num_workers=8
  2. for batch_idx, content in enumerate(trainloader):
  3. print("正确进入for循环")
  4. input, label = content
  5. batch_num = input.size(0)
  6. input = Variable(input.cuda())
  7. feat_pool, feat = net(input, input, test_mode[1])
  8. # 出错
  9. 》》》bug....
  10. 》》》不能进入for循环

原因

python 多进程调用,为了避免冲突,只允许在主函数下进行



解决办法

第一种:加速训练 num_workers>0 ,使用前加主函数环境声明

  1. if __name__ == '__main__':
  2. train_loader = Data.DataLoader(dataset=train_data,batch_size=1, shuffle=True, num_workers=8
  3. for batch_idx, content in enumerate(trainloader):
  4. print("正确进入for循环")
  5. input, label = content
  6. batch_num = input.size(0)
  7. input = Variable(input.cuda())
  8. feat_pool, feat = net(input, input, test_mode[1])
  9. 》》》correct....
  10. 》》》正确进入for循环

第二种:将 num_workers 设置为 0 ,即程序只在一个进程下运行

  1. train_loader = Data.DataLoader(dataset=train_data,batch_size=1, shuffle=True, num_workers=0
  2. for batch_idx, content in enumerate(trainloader):
  3. print("正确进入for循环")
  4. input, label = content
  5. batch_num = input.size(0)
  6. input = Variable(input.cuda())
  7. feat_pool, feat = net(input, input, test_mode[1])
  8. 》》》correct....
  9. 》》》正确进入for循环

第三种:设置多进程使用声明

  1. torch.multiprocessing.freeze_support()
  2. train_loader = Data.DataLoader(dataset=train_data,batch_size=1, shuffle=True, num_workers=8
  3. for batch_idx, content in enumerate(trainloader):
  4. print("正确进入for循环")
  5. input, label = content
  6. batch_num = input.size(0)
  7. input = Variable(input.cuda())
  8. feat_pool, feat = net(input, input, test_mode[1])
  9. 》》》correct....
  10. 》》》正确进入for循环
  11. train_loader = Data.DataLoader(dataset=train_data,batch_size=1, shuffle=True, num_workers=8
  12. p = multiprocessing.Process(target= train_loader)
  13. p.start()
  14. p.join()
  15. for batch_idx, content in enumerate(trainloader):
  16. print("正确进入for循环")
  17. input, label = content
  18. batch_num = input.size(0)
  19. input = Variable(input.cuda())
  20. feat_pool, feat = net(input, input, test_mode[1])
  21. 》》》correct....
  22. 》》》正确进入for循环


例子

  1. # 错误的样例
  2. import time
  3. import torch
  4. import torch.utils.data as Data
  5. train_dataset = torch.FloatTensor((100000, 32))
  6. batch_size = 32
  7. train_loader = Data.DataLoader(dataset=train_dataset,
  8. batch_size=batch_size, shuffle=True)
  9. train_loader2 = Data.DataLoader(dataset=train_dataset,
  10. batch_size=batch_size, shuffle=True, num_workers=8)
  11. start = time.time()
  12. for _ in range(200):
  13. for x in train_loader:
  14. pass
  15. end = time.time()
  16. print(end - start)
  17. start = time.time()
  18. for _ in range(200):
  19. for x in train_loader2:
  20. pass
  21. end = time.time()
  22. print(end - start)

正确的样例

  1. import time
  2. import torch
  3. import torch.utils.data as Data
  4. #Step 2: time it
  5. if __name__ == '__main__':
  6. train_dataset = torch.FloatTensor((100000, 32))
  7. batch_size = 32
  8. train_loader = Data.DataLoader(dataset=train_dataset,
  9. batch_size=batch_size, shuffle=True)
  10. train_loader2 = Data.DataLoader(dataset=train_dataset,
  11. batch_size=batch_size, shuffle=True, num_workers=8)
  12. start = time.time()
  13. for _ in range(200):
  14. for x in train_loader:
  15. pass
  16. end = time.time()
  17. print(end - start)
  18. start = time.time()
  19. for _ in range(200):
  20. for x in train_loader2:
  21. pass
  22. end = time.time()
  23. print(end - start)

发表评论

表情:
评论列表 (有 0 条评论,314人围观)

还没有评论,来说两句吧...

相关阅读