-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opencl training fail #30
Comments
I'm also using torch-cl, following the tutorial there, you shouldn't install Got something similar:
UPDATE: I also tried this on my MacBook Pro, same error:
|
The stacktrace suggests that the Error is on this line: local encoderOutput = self.encoder:forward(encoderInputs)
model:train(encInputs, decInputs, decTargets)
local err = model:train(input, target)
UPDATE: I checked out the last commit before the merge and I got the same error again. Only the hex numbers differ:
|
I am hitting this as well. I think something changed in the last month or so where the The problem is coming from train.lua:70 in |
@mgomes you got anything so far? I tried this again and stumbled upon the following part in the official torch source: function optim.adam(opfunc, x, config, state)
-- (0) get/update state
local config = config or {}
local state = state or config
local lr = config.learningRate or 0.001
local lrd = config.learningRateDecay or 0
local beta1 = config.beta1 or 0.9
local beta2 = config.beta2 or 0.999
local epsilon = config.epsilon or 1e-8 In the stacktrace, the epsilon allocation is mentioned to being a nil value while expecting a number. I assume that in cltoroch (distro-cl) there is no default value for this but I am unable to find the file in cltorch. The config object that gets passed to the function above is the following:
Here's another stacktrace:
UPDATE: I'm stupid. If you read the stacktrace, you'll notice |
Ok, I fixed a bunch of bugs yesterday. I think the easiest thing to do will be to simply reinstall
There was also a change to the I just now tested a full fresh reinstallation, using hte following commands:
|
For those too lazy to read the file: -b doesn't prompt for anything. Watch your .whateverrc after the install to remove duplicate entries of |
its not working yet .... I'm still trying to fix it. I got as far as |
I think it was automatically closed. Ping @macournoyer |
Ooops! Autoclosed indeed. |
Might be working now. Can you pull down latest updates to |
I have never be successful on training.
th train.lua --opencl --dataset 50000 --hiddenSize 1000
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...
Dataset stats:
Vocabulary size: 25931
Examples: 83632
libthclnn_searchpath /Users/SolarKing/Dev/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Using Apple , OpenCL platform: Apple
Using OpenCL device: GeForce 9400M
-- Epoch 1 / 50
/Users/SolarKing/Dev/torch/install/bin/luajit: ...larKing/Dev/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
bad argument #3 to '?' (number expected, got nil)
stack traceback:
[C]: at 0x0ebe4500
[C]: in function '__newindex'
.../Dev/torch-cl/install/share/lua/5.1/clnn/LookupTable.lua:108: in function <.../Dev/torch-cl/install/share/lua/5.1/clnn/LookupTable.lua:99>
[C]: in function 'xpcall'
...larKing/Dev/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...arKing/Dev/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./seq2seq.lua:71: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
.../Dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010e8bbbb0
WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
...larKing/Dev/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...arKing/Dev/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./seq2seq.lua:71: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
.../Dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010e8bbbb0
The text was updated successfully, but these errors were encountered: