-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to run 3-state (2 spot states) HMM data - getting CUDA memory error and "Iteration started with a new seed" warnings #420
Comments
The program restarts the run when there are NaN values detected in the parameters. It is usually ok if it happens small number of times during the entire run. If it happens repeatedly, like in your case, then there is something pathological. It is hard to tell if it is related to the data or the model without inspecting it deeply. Can we setup a Zoom meeting to have a closer look at this together? |
I also see that it has run 50800 iterations. How close it is to being converged when you look at Tensorboard? |
I found a workaround that I think may give you a clue: I made a new
directory and put in the same data (driftlist, header, on/off spots). Since
there wasn't a .tapqir folder, it seems to be running smoothly (10% and
counting)
…On Tue, Feb 14, 2023, 8:31 PM Yerdos Ordabayev ***@***.***> wrote:
The program restarts the run when there are NaN values detected in the
parameters. It is usually ok if it happens small number of times during the
entire run.
If it happens repeatedly, like in your case, then there is something
pathological. It is hard to tell if it is related to the data or the model
without inspecting it deeply. Can we setup a Zoom meeting to have a closer
look at this together?
—
Reply to this email directly, view it on GitHub
<#420 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A4HEVDEEH5QMYKY2CHLTM7DWXQWXBANCNFSM6AAAAAAU4DY2OQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Oh I guess that is the reason. The name of the model file is the same for 2 and 3 states hmm models. Since you already have run 2 state model you have that one saved in the |
other details: running v1.1.17, have previously successfully run the same data with a 2-state (1 spot state) HMM model.
Reduced spot/frame batches from 10->5 and 512-> 256 and still get many iterations (hundreds! example image only shows last few) of the warning before ultimately running out of CUDA memory
The text was updated successfully, but these errors were encountered: