Updated Documentation.

robfiras · Feb 24, 2024 · dbe6abb · dbe6abb
1 parent 57023df
commit dbe6abb
Show file tree

Hide file tree

Showing 7 changed files with 104 additions and 2 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -37,10 +37,21 @@ Key Advantages
 
 
 .. toctree::
+   :caption: Documentation
    :maxdepth: 3
    :hidden:
 
    source/loco_mujoco.installation.rst
    source/loco_mujoco.api.rst
-   source/loco_mujoco.tutorials.rst
+
+
+.. toctree::
+   :caption: Tutorials
+   :hidden:
+
+   source/tutorials/interfaces.rst
+   source/tutorials/imitation_learning.rst
+   source/tutorials/reinforcement_learning.rst
+   source/tutorials/domain_randomization.rst
+
 
diff --git a/docs/source/loco_mujoco.api.rst b/docs/source/loco_mujoco.api.rst
@@ -86,3 +86,21 @@ one, or even provide a default reward function as shown in the :doc:`./tutorials
     :hidden:
 
     ./loco_mujoco.rewards.rst
+
+Domain Randomization
+--------------------
+
+LocoMuJoCo comes with build-in domain randomization. In contrast to others, LocoMuJoCo randomizes parameters
+in the XML and recompiles the complete model whenever needed. We do so to ensure that the randomization is consistent
+across all parameters, as some parameters are calculated based on others *during compilation*. An example is the the
+default inertia from geometries. The latter is calculated based on the shape and mass of the geometry. When just randomizing
+the mass *of the model*, the inertia will remain unchanged resulting in inconsistent randomization. However, if the mass
+is randomized *in the XML* and the model is recompiled, the inertia will be recalculated based on the new mass resulting
+in consistent randomization. :doc:`Here <./loco_mujoco.domain_randomization>`, you find the API for domain randomization.
+Also take a look at the :doc:`./tutorials/domain_randomization` tutorials to see how to use domain randomization in
+LocoMuJoCo.
+
+.. toctree::
+    :hidden:
+
+    ./loco_mujoco.domain_randomization.rst
diff --git a/docs/source/loco_mujoco.domain_randomization.rst b/docs/source/loco_mujoco.domain_randomization.rst
@@ -0,0 +1,8 @@
+.. _dom-rand:
+Randomization Handler
+-----------------------------------
+.. automodule:: loco_mujoco.utils.domain_randomization
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :special-members: __call__
diff --git a/docs/source/loco_mujoco.environments.rst b/docs/source/loco_mujoco.environments.rst
@@ -1,4 +1,3 @@
-.. _env-label:
 Basics
 =================================
 

diff --git a/docs/source/tutorials/domain_randomization.rst b/docs/source/tutorials/domain_randomization.rst
@@ -1,2 +1,21 @@
+.. _dom-rand-tutorial:
+
 Domain Randomization
 =================================
+
+In this tutorial, we will show how to use the domain randomization feature. This feature is useful to train a
+robot to be robust to changes in the environment, such as joint friction, mass, or inertia. Before starting, make sure
+to get familiar with the :ref:`dom-rand`, where you find a detailed documentation.
+
+Consider the following domain randomization file for the Talos robot:
+
+.. literalinclude:: ../../../examples/domain_randomization/domain_randomization_talos.yaml
+    :language: yaml
+
+Once a configuration file is created, we can pass it to the environment and start training as usual.
+Here is an example of how to use the domain randomization feature with the Talos robot:
+
+.. literalinclude:: ../../../examples/domain_randomization/example_talos.py
+    :language: python
+
+.. note:: We provide more examples in respective directory in the main LocoMuJoCo repository.
diff --git a/docs/source/tutorials/imitation_learning.rst b/docs/source/tutorials/imitation_learning.rst
@@ -116,3 +116,35 @@ to:
         # ...
 
         # pass the new learning rate to the agent
+
+
+Load and Evaluate a Trained Agent
+---------------------------------
+
+The best agents are saved every :code:`n_epochs_save` epochs at your specified directory or at the default directory
+:code:`./logs`. To load and evaluate a trained agent, you can use the following code:
+
+.. code-block:: python
+
+    from mushroom_rl.core import Core, Agent
+    from loco_mujoco import LocoEnv
+
+
+    env = LocoEnv.make("Atlas.walk")
+
+    agent = Agent.load("./path/to/agent.msh")
+
+    core = Core(agent, env)
+
+    core.evaluate(n_episodes=10, render=True)
+
+In the example above, first an Atlas environment is created. Then, the agent is loaded from the specified path. Finally,
+the agent is evaluated for 10 episodes with rendering enabled.
+
+Continue Training from a Checkpoint
+-----------------------------------
+
+Similarly to above, if you want to continue training from a checkpoint, you can replace the line
+:code:`agent = get_agent(env_id, mdp, use_cuda, sw)` in the :code:`experiment.py` file with the following line
+:code:`agent = Agent.load("./path/to/agent.msh")`. In that case, you will continue training from the specified
+checkpoint.
diff --git a/docs/source/tutorials/reinforcement_learning.rst b/docs/source/tutorials/reinforcement_learning.rst
@@ -1,2 +1,17 @@
 Reinforcement Learning
 =================================
+
+Even though LocoMuJoCo focuses on imitation learning, it can be also used for plain reinforcement learning. The challenge
+here is to define a reward function that produces the desired behavior. Here is a minimal example for defining a reinforcement
+learning example:
+
+.. note:: This is for didactic purposes only! It will not produce any useful gait.
+
+.. literalinclude:: ../../../examples/reinforcement_learning/example_unitree_h1.py
+    :language: python
+
+Right now, LocoMuJoCo only supports Markovian reward functions (i.e., functions only depending on the current
+state transition). We are thinking about providing support for non-Markovian reward functions as well by providing access
+to the environment in the reward function. Open an issue or drop me a message if you think this is something
+we should really do!
+