THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

farleylai · 2017-09-29T03:16:08Z

The use case is to create multi-GPU model variants in multiple threads and even for later multi-threaded training. Only when the model is made data parallel using the DataParallelTable would the following THREAD PANIC and Segmentation fault be thrown when the data parallel model is passing between main thread and worker threads.

FATAL THREAD PANIC: (read) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:370: table index is nil
THCudaCheck FAIL file=../Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c line=238 error=29 : driver shutting down
FATAL THREAD PANIC: (write) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:210: cuda runtime error (29) : driver shutting down at /home/ml/farleylai/Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c:238	
Segmentation fault (core dumped)

The model is made data parallel using the multi-GPU example code:
function Models.parallelize(model)
if opt.nGPU > 1 then
local gpus = torch.range(1, opt.nGPU):totable()
local dpt = nn.DataParallelTable(1, true, true):add(model, gpus):threads(function() require 'cudnn' cudnn.benchmark = true end)
dpt.gradInput = nil
model = dpt:cuda()
end
return model
end

Any ideas or justifications?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

farleylai commented Sep 29, 2017

THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

Comments

farleylai commented Sep 29, 2017