-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad argument #3 to '?' (lookupTable.lua:updateOutput) #8
Comments
Well, step 1: I've installed neuroconvo, and can reproduce the problem :-) |
Ok, so root cause is: clnn's LookupTable relies on Fix is I need to somehow update clnn's LookupTable to be able to survive without these values being initialized, probalby by looking at the dimension of |
So, I think this problem is fixed. Which uncovers a new one :-P
|
Seems like
I may need to either fork rnn, or submit a patch, or probably both. |
Thank you so much. Let me know if I can help somehow. |
Ok, so I've plausibly fixed the ZeroMask.lua, in draft. But now I see a new error, see below, so I'd better fix that one too :-P
|
for the |
Oh,
|
I think I fixed the
|
Ah, another cuda-specific bit in rnn, this time in MaskZeroCriterion.lua:
Taking a look... |
Fixed the MaskZeroCrtierion issue... next issue :-P
Checking... |
(ah, thats cos I added this code into trian.lua, for debugging :-P
:-P) |
Ok, it runs now, I think. You probably want to just reinstall distro-cl from scratch I think, on the whole... there were a whole ton of updates. |
hi Hugh, Thank you for fixing the bug so quickly, however, I still encounter a somewhat similar problem. Here is the stacktrace: ` [============================================= 97/97 19s55ms | Step: 219ms -- Eval on validation.. Finished in 29s593ms 3.2777983081751 examples/sec. Epoch stats: (Saving model ...) Seems like there are still something wrong with "ClTensor" in module.lua? Do you have any suggestion upon this? Thank you again! Best Vincent |
Ok. Thoughts on how I can reproduce this, without waiting an hour or so between each test? Some way of reproducing number of iterations etc? |
(well, I will hack the loop I think, |
Ok, can reproduce the issue:
|
Hmmm. It uses |
seems like one option to get this working might be to use boost.compute, which implements |
@Vincent717 Question: if the solution involved needing to install |
@hughperkins Thanks for your time, installing boost is totally ok for me. But it would be appreciated if you can explain a little bit more concrete details to me, because I am a newbie of opencl haha.. |
I'm having the same issue now 😁 |
Well... I implemented maskedSelect, https://github.com/hughperkins/cltorch/compare/add-maskedselect , but need to figure out how to get [edit: I should say, Jacob wrote an implementatoin for me actually :-) https://github.com/boostorg/compute/issues/646#issuecomment-241282490 ] |
So, fixed the bytetensor bit. New error :-)
Checking... |
"fixed" that. new error, seg fault:
Checking... |
Welcome to debugging hell 😁 BTW: I can also use neural-style since your fixes. I used to get an "Out of Memory" error with torch-cl. |
Ok. Seems the segfault was related to my using bfboost. Using a full standard Titan X, no segfault. Logged a bug report with bfboost for the bfboost-related segfault. So, now onto the next bug :-P
|
Oh, cool :-) thats interesting. |
(Seems there is some bug in
Checking... ) (edit: it's pretty bizarre, I've checked all numbers up to 1200000 so far (it's checking as I write...), and they all work ok. The first number I know of that it fails for is 33110447, whose sum is 33110448.) |
ok let me see, do you mean we should follow your step to build the makeSelect by boost.compute blabla or just reinstall torch-cl? |
The easiest way (ie least likely to fail), is to reinstall torch-cl, the whole thing. If you already reinstalled since #8 (comment) , and you prefer hacking around over waiting for a full reinstall, you could try (unsupported... if it doesnt work, please do full torch-cl reinstall please :-) ) :
|
It works! This is so cool!! Thank you very much! |
Cool :-)
:-) |
I can also confirm that it works although I'm not as far as saving a model. (Slow GPU) |
Cool :-) |
(background info for anyone using bfboost: bfboost now supports the OpenCL methods needed to run this, and this runs ok on bfboost now, without segfaulting) |
Hi,
I'm trying to get neuralconvo to work with distro-cl. Unfortunately I'm getting an error when starting the training that I'm unable to solve and that seems to be coming from
LookupTable.lua
.I already filed an issue in the project but it seems like it's not going to be fixed there.
First things first, here is the stacktrace:
The error occurs here (in the last line):
Do you have any idea or ideally a solution for my problem?
Regards
Lukas
The text was updated successfully, but these errors were encountered: