-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something wrong about the training process #10
Comments
This seems like it may be an issue in the plotting on your server? Could you try just running the three lines you posted in a fresh python shell
Does this throw an error? You may need to use a different backend instead of Also if your traceback was longer it may be helpful to understand where in the droid_policy_learning code this was triggered. |
@kpertsch Thank you for your reply! When I rerun the training code, it stuck here this time: It seems like randomly stuck at random epochs, no matter whether doing the plot or not. (Confused) |
Mmh that's odd -- could you try killing your program that's stuck with Ctrl+C to see where in the program it got stuck? Very hard to debug such silent freezing issues. |
@kpertsch Okay, I tried to Ctrl+C to see what happened, and got this: This is my OS(WSL2) configuration: I change the parameters of training to those: |
@ashwin-balakrishna96 have you seen such behavior before? |
@kpertsch Okay, I will try it and let you know the result. Update: But the way, when I ran the code in this environment, I found this problem happened:
If I follow the procedure of installation in this project, and I get this package related to Nvidia:
When I went back to run the fine-tune code of Octo, it cannot work in this environment. |
Hello, I try to train policy with droid_100, but I always get the following error: The builder directory D:\droid-main\dataset\droid_100\droid_100 doesn't contain any versions. |
That should work, perhaps something went wrong with preserving the correct directory structure when downloading the dataset? Can you confirm that the directory structure is as follows: Within DATA_PATH there should be a folder called |
Yes, it is work now, but it still comes up sometimes. it's so weird. |
Hi, when I used droid_100 datasets train the model, at the beginning, everything went well, until this error came out:
and stuck here:
And the whole process was killed, so I went to search for the problem, and got this answer:
However, I found this code in the file visualization.lib, and still met this error during training.
Looking forward to the reply, thanks. @kpertsch
The text was updated successfully, but these errors were encountered: