-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation get stuck #12
Comments
Hi, can you try running with a single GPU? |
Hi, We tried a single gpu on both train_shallow_wikikgv2.sh and train_concat_wikikgv2.sh, they both stuck in the evalution. Thanks. |
Just to make sure, have you pulled the latest change? What is the script you are running? We will look into this and reproduce. |
Hi, yes we have already pulled the latest change. We are running train_shallow_wikikgv2.sh and train_concat_wikikgv2.sh in the training/vec_scripts folder. Thanks! |
Hi there, |
Hi, |
really sorry for the back-and-forth! I guess it is mostly due to the compatibility of customized kernel. |
Hi, the information is listed as follows: |
Hi,
Seems there is still a chance for the evalution to get stuck. When we run the train_shallow_wikikgv2.sh , it runs after 4799999 steps and gets stuck in the evaluation. When we stop it with keyboard interrupt, we got the following message:
And when we run the train_concat_wikikgv2.sh , it stucks at the first time for the evaluation. When we stop it with keyboard interrupt, it shows similar error messages with the train_shallow_wikikgv2.sh.
Could you please help to check? Any help is appreciated!
The text was updated successfully, but these errors were encountered: