-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
difficulty running/loading model on GPU #16
Comments
Hi @murakdar -- were you able to fix this issue? |
Hello @ecvgit. No, this issue remains unresolved. |
Hi @murakdar, can you try specifying the GPU explicitly using |
I was able to resolve this error. I think it happens because you are not using a compatible CUDNN version. I was able to use TF 12 with CUDNN 7.9.0 and CUDA 9. |
Hello @alquraishi; adding Here are the commands I tried and their output: First, with
To get rid of the resulting memory issue, I tried again with
It stops running after ~15 seconds. The directory Any further ideas would be greatly appreciated. @ecvgit: I am presently using cuDNN 7.1.4. In my first comment, I believe I was using cuDNN 7.6.1. I tried downgrading to fix the issue but at some point got the error |
Could you try running it for CASP7? |
@alquraishi Is it possible to share the .tertiary files for the models reported in the paper? I was able to generate the .tertiary files, but the DRMSD does not match -- which makes it hard to figure out if there is something wrong in my DRMSD computation vs using the wrong .tertiary files. |
Tried, still the same behavior. @ecvgit, if I understand correctly, you have been able to run new predictions with the pre-trained model; could you perhaps share an example FASTA sequence file, corresponding .tfrecord file, and configuration file that I could drop in to one of the pre-trained models? I did some further debugging and found that I'm hitting Lines 320 to 321 in 0133213
tf.Session.run() on the TF ops here. The TF ops being run (i.e. self._prediction_ops ) look like this:
For what it's worth, here's the complete traceback for running an individual op:
|
I was able to run the predictions on the proteinnet test set. |
I am now able to run predictions using the default configuration file as indicated -- thank you, @ecvgit and @alquraishi. However, I am still unable to run predictions of a single new sequence. The queue/range error in my last comment suggests my problem relates to the Shall I continue here, or open a separate issue for that? (I'm tempted to prefer the latter, since the |
I have been trying to predict the structure of a new sequence using the available pre-trained model (CASP11), but I've so far been unsuccessful in running the model. Note that I was equally unsuccessful in training a new model, with similar errors as below, but I will frame this in the context of the prediction task.
First, I successfully followed the input preparation steps provided in the README (i.e. using HMMER and convert scripts). Then, I slightly modified the configuration file to locate the
.tfrecord
files to be tested. From inside thergn
directory, I runpython model/protling.py ../models/RGN12/runs/CASP12/ProteinNet12Thinning90/configuration-test -d ../models/RGN12 -p -e weighted_testing
.The resulting error is:
A complete log file is found at the end of this message. Training a new model based on the ProteinNet data sets also doesn't work for me, with a similar error. I suspect the underlying culprit is the following line:
However, I know that the machine does have a working GPU on which other applications can run. For example, the command
python -c 'import tensorflow as tf; sess = tf.Session(); devices = sess.list_devices(); print(devices)'
works as expected; the resulting output is:I am using TensorFlow 1.12.0 with CUDA 9.0 on Python 2.7.12. Trying with or without
export CUDA_VISIBLE_DEVICES=0
had no effect. I'd be happy to provide any additional information that could be useful.Finally, I'm not sure if it's relevant to this particular issue, but I was also unable to successfully run
python tests.py
(from withinrgn/models
). (This is after extractingtests_data.zip
and adjustingbase_dir
on line 20 accordingly.) After some deprecation warnings, here is the output from the first two unit tests:The remaining tests all raise the same
RuntimeError: Model already started; cannot create new objects.
Moreover, running an individual test doesn't seem to produce any useful output:Here is the complete output log file located in
../models/RGN12/logs/CASP12.log
:I greatly appreciate your time in helping to get this working on my end!
The text was updated successfully, but these errors were encountered: