Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: typos in domain_randomization.md, traditional diversity dictates #12

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions isaacgymenvs/docs/domain_randomization.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,15 +257,15 @@ Our [DeXtreme](https://dextreme.org) work brings Automatic Domain Randomisation

**Background**

ADR was first introduced in [OpenAI 2019 et. al](https://arxiv.org/abs/1910.07113). We develop the vectorised version of this and use that to train our policies in sim and transfer to the real world. Our experiments reaffirm that ADR imbues robustness to the policies closing the sim-to-real gap significantly leading to better performance in the real world compared to traiditional manually tuned domain randomisation.
ADR was first introduced in [OpenAI 2019 et. al](https://arxiv.org/abs/1910.07113). We develop the vectorised version of this and use that to train our policies in sim and transfer to the real world. Our experiments reaffirm that ADR imbues robustness to the policies closing the sim-to-real gap significantly leading to better performance in the real world compared to traditional manually tuned domain randomisation.

Hand-tuning the randomisation ranges (_e.g._ means and stds of the distributions) of parameters can be onerous and may result in policies that lack adaptability, even for slight variations in parameters outside of the originally defined ranges. ADR starts with small ranges and automatically adjusts them gradually to keep them as wide as possible while keeping the policy performance above a certain threshold. The policies trained with ADR exhibit significant robustness to various perturbations and parameter ranges and improved sim-to-real transfer. Additionally, since the ranges are adjusted gradually, it also provides a natural curriculum for the policy to absorb the large diverity thrown at it.
Hand-tuning the randomisation ranges (_e.g._ means and stds of the distributions) of parameters can be onerous and may result in policies that lack adaptability, even for slight variations in parameters outside of the originally defined ranges. ADR starts with small ranges and automatically adjusts them gradually to keep them as wide as possible while keeping the policy performance above a certain threshold. The policies trained with ADR exhibit significant robustness to various perturbations and parameter ranges and improved sim-to-real transfer. Additionally, since the ranges are adjusted gradually, it also provides a natural curriculum for the policy to absorb the large diversity thrown at it.

Each parameter that we wish to randomise with ADR is modelled with uniform distribution `U(p_lo, p_hi)` where `p_lo` and `p_hi` are the lower and the upper limit of the range respectively. At each step, a parameter is randomy chosen and its value set to either the lower or upper limit keeping the other parameters with their ranges unchanged. This randomly chosen parameter's range is updated based on its performance. A small fraction of the overall environments (40% in our [DeXtreme](https://dextreme.org) work) is used to evaluate the performance. Based on the performance, either the range shrinks or expands. A visualisation from the DeXtreme paper is shown below:

![ADR](https://user-images.githubusercontent.com/686480/228732516-2d70870d-828c-4934-a3c2-17b989683a6d.png)

If the parameter value was set to the lower limit, then a decrease in performance, measured by performance threshold `t_l`, dicatates reducing the range of the parameter (shown in (a) in the image) by increasing the lower limit value by a small delta. Conversely, if the performance is increased, measured by performance threshold, `t_h`, the lower limit is decreased (shown in (c) in the image) leading to expanding the overall range.
If the parameter value was set to the lower limit, then a decrease in performance, measured by performance threshold `t_l`, dictates reducing the range of the parameter (shown in (a) in the image) by increasing the lower limit value by a small delta. Conversely, if the performance is increased, measured by performance threshold, `t_h`, the lower limit is decreased (shown in (c) in the image) leading to expanding the overall range.

Similarly, if the parameter value was set to the upper limit, then an increase in performance, measured by performance threshold `t_h`, expands the range (shown in (b) in the image) by increasing the upper limit value by a small delta. However, if the performance is decreased, measured by performance threshold, `t_l`, the upper limit is decreased (shown in (d) in the image) leading to shrinking the overall range.

Expand Down