You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The use case is to create multi-GPU model variants in multiple threads and even for later multi-threaded training. Only when the model is made data parallel using the DataParallelTable would the following THREAD PANIC and Segmentation fault be thrown when the data parallel model is passing between main thread and worker threads.
FATAL THREAD PANIC: (read) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:370: table index is nil
THCudaCheck FAIL file=../Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c line=238 error=29 : driver shutting down
FATAL THREAD PANIC: (write) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:210: cuda runtime error (29) : driver shutting down at /home/ml/farleylai/Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c:238
Segmentation fault (core dumped)
The model is made data parallel using the multi-GPU example code:
function Models.parallelize(model)
if opt.nGPU > 1 then
local gpus = torch.range(1, opt.nGPU):totable()
local dpt = nn.DataParallelTable(1, true, true):add(model, gpus):threads(function() require 'cudnn' cudnn.benchmark = true end)
dpt.gradInput = nil
model = dpt:cuda()
end
return model
end
Any ideas or justifications?
The text was updated successfully, but these errors were encountered:
The use case is to create multi-GPU model variants in multiple threads and even for later multi-threaded training. Only when the model is made data parallel using the DataParallelTable would the following THREAD PANIC and Segmentation fault be thrown when the data parallel model is passing between main thread and worker threads.
The model is made data parallel using the multi-GPU example code:
function Models.parallelize(model)
if opt.nGPU > 1 then
local gpus = torch.range(1, opt.nGPU):totable()
local dpt = nn.DataParallelTable(1, true, true):add(model, gpus):threads(function() require 'cudnn' cudnn.benchmark = true end)
dpt.gradInput = nil
model = dpt:cuda()
end
return model
end
Any ideas or justifications?
The text was updated successfully, but these errors were encountered: