-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where are policy_lm and critic_lm? #7
Comments
I also encountered this problem. Did you solve it? |
No,i have give up! But digiRL can setup and run well. |
Apologies for responding at this time. The model Phase 1
Phase i (i > 1)
Offline Data Path We recommend reviewing our paper to fully understand the complete training process. Additionally, in issue 4, we briefly introduced the entire training process. |
thks a lot!!! |
Thank you for such a detailed answer, but I am still a bit confused.Where is the Offline Data in the paper experiments ?Is it the /WebRL/LLaMA-Factory/data/web_policy_sft.json? |
Below is the pseudocode of WebRL training process LLaMA-Factory/data/web_policy_sft.json is used to perform SFT. Once the model is fine-tuned, it interacts with WebArena to collect rollout data. These rollouts, along with previously gathered experiences, are combined to create the Offline Data. |
那请问critic_lm应该用哪个模型呢,用Llama-3.1-8B可以吗 |
Set critic_lm to the path to the SFT-trained model |
scripts/config/main/webrl.yaml:
defaults:
save_path: /workspace/WebRL/scripts/output
run_name: "webrl"
critic_lm# training
policy_lm: /workspace/WebRL/webrl-glm-4-9b? # safetensors files of paramerters of the actor model
critic_lm: /workspace/WebRL/webrl-glm-4-9b? # safetensors files of paramerters of the critic model
critic_epochs: 1 # number of epochs for the critic each phase
actor_epochs: 1 # number of epochs for training the actor each phase
batch_size: 1 # batch size for training the actor and critic
critic_resume_path: /workspace/WebRL/webrl-glm-4-9b # .bin file of paramerters of the critic model
offline_data_path: /workspace/WebRL/scripts/offline_data
checkpointing_steps: 400
~
The text was updated successfully, but these errors were encountered: