Wrong behavior with `generalized` and instability with `positional` and `spring` #390

UltronAI · 2023-08-04T11:39:07Z

UltronAI
Aug 4, 2023

Hello there,

I have been recently exploring and experimenting with the Brax Basics Colab, training the built-in APG/PPO algorithm and investigating the impact of different physics backends. In the process, I've encountered some intriguing behavior patterns that I'd like to share and discuss.

Tossing the ball

In this task, the ball is expected to bounce upon after hitting the ground. The default positional backend executed this behavior well and produced satisfying results. However, when I employed the generalized backend, the ball merely rolled on the ground, failing to bounce. This response does not correspond to expected physics-based behavior.

Here's the visualization with positional in Red, generalized in Green, and spring in Blue:

Swinging the pendulum

In this task, the generalized backend performed well, providing the expected results. However, the positional backend collapsed for all step sizes, and the spring backend exhibited instability when the step size increased.

Pointing the pendulum

In this task, only generalized worked well. Both the positional and spring backends failed to maintain stability.

APG training

During my attempts to train an APG agent using these backends, the results varied significantly.

For the Ant task, which has rich contact interactions, APG struggled with the generalized backend but exhibited some learning capacity with positional and spring.

As for the Reacher task, which is a simpler system, APG with generalized outperformed the other two backends.

(Update: I find that reacher with generalized may lead to NaN gradients as well)

The parameters I used for these two experiments:

'ant': functools.partial(apg.train, num_evals=500, episode_length=300, normalize_observations=True, 
                                 action_repeat=1, num_envs=256, num_eval_envs=128, learning_rate=3e-3,
                                 truncation_length=10, seed=args.seed)
'reacher': functools.partial(apg.train, num_evals=200, episode_length=300, normalize_observations=True, 
                                     action_repeat=4, num_envs=128, num_eval_envs=8, learning_rate=3e-3,
                                     truncation_length=10, seed=args.seed)

Interestingly, these results seem to suggest that the generalized backend excels when handling contacts but struggles to compute their gradients (or the analytic gradients are useless for training), with the exception of the ball-tossing experiment. On the other hand, the positional and spring backends handle more complex systems efficiently but fail when dealing with simpler systems like the pendulum.

(Update: after more experiments with positional and spring, I find that they are unsurprisingly not as good as generalized in some environments and can lead to different learned policies. For instance, in walker2d, PPO with generalized learns to walk with two legs, PPO with positional learns to walk with only one leg, and PPO with spring fails to walk, all using the same training parameters.)

Moreover, I recently came across a paper that delves into the topic of how varying contact models can produce different outcomes. The authors of this paper conducted a series of experiments utilizing Brax v1, specifically employing the positional and legacy_spring backends.

As per their findings, the discrepancies between the results could be attributed to the different contact models used. This further intrigues me to understand the mechanics of the generalized backend, particularly how it models the contacts.

To enhance the comprehension of these behaviors and the potential reasons behind the aforementioned inconsistencies, I would greatly appreciate a more in-depth explanation or any resources that could illuminate the underlying workings of the generalized backend in contact modeling.

I look forward to your insightful responses and thanks for your great work on Brax!

amine789 · 2023-08-05T22:01:58Z

amine789
Aug 5, 2023

Hi i am trying to learn the elasticity parameters in the positional based dynamics by generating two scenarios, one simulation scenario with the real elasticity parameters and the other one with random elasticity, then compute the loss wrt positions but i am getting elasticity zero elasticity gradient wrt to position, did you get something similar?

0 replies

UltronAI · 2023-08-06T05:05:54Z

UltronAI
Aug 6, 2023
Author

Hi, @amine789
While I didn't directly experiment with elasticity parameters, I found some relevant insights in this paper. See the data presented in the final row of Table 1. The authors provided a reasonable explanation in the corresponding section. Hope it's helpful for you!

0 replies

btaba · 2023-08-31T23:58:25Z

btaba
Aug 31, 2023
Maintainer

Hi @UltronAI , thanks for the insightful experiments using brax!

[1] Tossing the ball

the restitution params between spring/positional and generalized are quite different. For spring/positional, they are controlled by the elasticity param (https://github.com/google/brax/blob/main/brax/io/mjcf.py#L182) and for generalized we use the contact model from MuJoCo. The restitution for generalized is controlled by the solver params, see https://mujoco.readthedocs.io/en/latest/modeling.html#restitution

Other params that may affect restitution in positional are the physics timestep and collide_scale, amongst others.

[2] Swinging the pendulum

I would recommend tuning parameters to maintain stability in positionl//spring. Joint constraints are implicitly maintained in generalized since it uses Featherstone's. For positional/spring, joint constraints are resolved at every time step and are likely to be more unstable, and need to be tuned (ang/linear damping etc.).

It isn't clear if you are training a policy for swinging, and are making stability conclusions based on the final policy or the physics?

[4] training

Thanks for the findings! The contact impulse are backed out through the constraint solver in generalized, and it is thus not too surprising that the gradients may be less useful (autograd through a constraint solve) compared to the simple impulse-based contact updates in spring/positional.

For further reference on the contact model for generalized, see https://mujoco.readthedocs.io/en/latest/computation.html#contact

3 replies

UltronAI Sep 3, 2023
Author

Hi @btaba , thanks for your response!

Regarding the ball and pendulum experiments, I primarily conducted them to assess the performance across various backends rather than training a specific policy.

For policy training using simulation gradients, I agree that the generalized backend might not be designed for algorithms like APG and SHAC. However, when I utilize the positional/spring backends, I still struggle to get APG/SHAC to perform at a level that's on par with PPO. Their performance is quite underwhelming. Have you had any success with methods that leverage differentiability to enhance performance?

Additionally, I'm curious about the reliability of the gradients from positional/spring backends. Given that they might not strictly adhere to physical principles, can we trust their gradients, especially when training policies intended for real-world robotic applications?

The SHAC paper distinctly illustrates that, when paired with a straightforward algorithm, dependable simulation gradients can substantially surpass PPO with fewer steps and shorter wall-clock duration. This highlights the potential of such algorithms in robotic applications. Given Brax's expansive backend offering, combined with JAX's scalability and flexibility, it seems to be a promising platform for these experiments. However, my preliminary tests suggest that Brax might not be entirely compatible with existing methods. Do you have any insights on addressing these gradient challenges? Or do you foresee any unintended repercussions this might have on Brax?

btaba Sep 7, 2023
Maintainer

Hi @UltronAI I have not personally used SHAC/APG, although there is a vibrant discussion on the topic here, with challenges: #262

As others mentioned, https://openreview.net/forum?id=KIl0LZ9tJex has a good analysis, and so does https://arxiv.org/abs/2111.05803. We have not been prioritizing addressing gradient challenges in the near term

cdagher Jan 25, 2024

@btaba By not addressing gradient challenges, do you mean only in the Generalized pipeline? Does MJX also have this as a low priority, or were gradient challenges a major consideration in the development?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong behavior with `generalized` and instability with `positional` and `spring` #390

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Wrong behavior with generalized and instability with positional and spring #390

UltronAI Aug 4, 2023

Replies: 3 comments · 3 replies

amine789 Aug 5, 2023

UltronAI Aug 6, 2023 Author

btaba Aug 31, 2023 Maintainer

UltronAI Sep 3, 2023 Author

btaba Sep 7, 2023 Maintainer

cdagher Jan 25, 2024

Wrong behavior with `generalized` and instability with `positional` and `spring` #390

UltronAI
Aug 4, 2023

Replies: 3 comments 3 replies

amine789
Aug 5, 2023

UltronAI
Aug 6, 2023
Author

btaba
Aug 31, 2023
Maintainer

UltronAI Sep 3, 2023
Author

btaba Sep 7, 2023
Maintainer