Skip to content

Commit

Permalink
docs: double dqn
Browse files Browse the repository at this point in the history
  • Loading branch information
wenzhangliu committed Dec 27, 2024
1 parent a950e7f commit 7430c7d
Show file tree
Hide file tree
Showing 6 changed files with 186 additions and 34 deletions.
Binary file modified docs/source/_static/figures/algo_framework/dqn_framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/source/documents/api/agents/core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,22 +50,22 @@ TensorFlow2
MindSpore
^^^^^^^^^^^^^

.. automodule:: xuance.tensorflow.agents.core.off_policy
.. automodule:: xuance.mindpore.agents.core.off_policy
:members:
:undoc-members:
:show-inheritance:

.. automodule:: xuance.tensorflow.agents.core.off_policy_marl
.. automodule:: xuance.mindpore.agents.core.off_policy_marl
:members:
:undoc-members:
:show-inheritance:

.. automodule:: xuance.tensorflow.agents.core.on_policy
.. automodule:: xuance.mindpore.agents.core.on_policy
:members:
:undoc-members:
:show-inheritance:

.. automodule:: xuance.tensorflow.agents.core.on_policy_marl
.. automodule:: xuance.mindpore.agents.core.on_policy_marl
:members:
:undoc-members:
:show-inheritance:
176 changes: 176 additions & 0 deletions docs/source/documents/api/agents/drl/ddqn_agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Double Deep Q-Network (Double DQN)

Double Deep Q-Network (Double DQN) is an enhancement of the original Deep Q-Network (DQN) algorithm,
designed to address the issue of overestimation of Q-values, which is a common problem in Q-learning-based methods.
Overestimation occurs when the Q-learning algorithm selects actions based on
a maximum Q-value that might be overestimated due to noise or approximation errors.
This can lead to suboptimal policies and unstable training.

Paper Link:
[**AAAI**](https://ojs.aaai.org/index.php/AAAI/article/view/10295),
[**ArXiv**](https://arxiv.org/pdf/1509.06461),
[**Double Q-learnig**](https://proceedings.neurips.cc/paper_files/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf)

## The Risk of Overestimating

In standard DQN, overestimation occurs due to the use of a single Q-network for both selecting and evaluating actions.

As introduced before, [**DQN**](dqn_agent.md) updates the Q-value for a state-action pair $Q(s, a)$
by using the maximum of Q-value of the next state $\max_{a'}Q(s', a')$ as part of the target.

If the Q-network overestimates one or more state-action values, the overestimation propagates and accumulates over time.
This will result in overly optimistic Q-values which can lead to unstable training and cause policy to favor suboptimal actions.

Overestimating biases the agent toward actions that appear better than they are,
potentially leading to poor decision-making and performance degradation in complex environments.

Besides, if overestimation becomes severe, it can destablize training, causing the Q-values to diverge.

## Main Idea

The main idea of Double DQN is to decouple the action selection and action evaluation processes,
reducing the risk of overestimating Q-values.
This is achieved by maintaining two separate Q-value estimation steps:
- **Action Selection**: Use the current Q-network to select the best action for a given state.

$$
a^* = \arg\max_{a}Q(s', s; \theta).
$$

- **Action Evaluation**: Use the target Q-network to evaluate the value of the selected action.

$$
y = r + \gamma Q(s', a*;\theta^{-}).
$$

Then, update the Q-network by minizing the loss:

$$
L(\theta) = \mathbb{E}_{(s, a, s', r) \sim \mathcal{D}}[(y-Q(s, a; \theta))^2].
$$

Finally, don't forget to update the target networks: $\theta^{-} \leftarrow \theta$.

## Run Double DQN in XuanCe

Before running Double DQN in XuanCe, you need to prepare a conda environment and install ``xuance`` following
the [**installation steps**](https://xuance.readthedocs.io/en/latest/documents/usage/installation.html).

The overall agent-environment interaction of Double DQN, as implemented in XuanCe, is illustrated in the figure below.

```{eval-rst}
.. image:: ./../../../../_static/figures/algo_framework/dqn_framework.png
:width: 100%
:align: center
```

### Run Build-in Demos

After completing the installation, you can open a Python console and run DQN directly using the following commands:

```python3
import xuance
runner = xuance.get_runner(method='ddqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
is_test=False)
runner.run() # Or runner.benchmark()
```

### Run With Self-defined Configs

If you want to run Double DQN with different configurations, you can build a new ``.yaml`` file, e.g., ``my_config.yaml``.
Then, run the Double DQN by the following code block:

```python3
import xuance as xp
runner = xp.get_runner(method='ddqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
config_path="my_config.yaml", # The path of my_config.yaml file should be correct.
is_test=False)
runner.run() # Or runner.benchmark()
```

To learn more about the configurations, please visit the
[**tutorial of configs**](https://xuance.readthedocs.io/en/latest/documents/api/configs/configuration_examples.html).

### Run With Customized Environment

If you would like to run XuanCe's Double DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
[**New Environment Tutorial**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-1-create-a-new-environment).
Then, [**prepapre the configuration file**](https://xuance.readthedocs.io/en/latest/documents/usage/new_envs.html#step-2-create-the-config-file-and-read-the-configurations)
``dqn_myenv.yaml``.

After that, you can run Double DQN in your own environment with the following code:

```python3
import argparse
from xuance.common import get_configs
from xuance.environment import REGISTRY_ENV
from xuance.environment import make_envs
from xuance.torch.agents import DDQN_Agent

configs_dict = get_configs(file_dir="dqn_myenv.yaml")
configs = argparse.Namespace(**configs_dict)
REGISTRY_ENV[configs.env_name] = MyNewEnv

envs = make_envs(configs) # Make parallel environments.
Agent = DDQN_Agent(config=configs, envs=envs) # Create a DDPG agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
Agent.save_model("final_train_model.pth") # Save the model to model_dir.
Agent.finish() # Finish the training.
```

## Citations

```{code-block} bash
@article{hasselt2010double,
title={Double Q-learning},
author={Hasselt, Hado},
journal={Advances in neural information processing systems},
volume={23},
year={2010}
}
```

```{code-block} bash
@inproceedings{van2016deep,
title={Deep reinforcement learning with double q-learning},
author={Van Hasselt, Hado and Guez, Arthur and Silver, David},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
volume={30},
number={1},
year={2016}
}
```

## APIs

### PyTorch

```{eval-rst}
.. automodule:: xuance.torch.agents.qlearning_family.ddqn_agent
:members:
:undoc-members:
:show-inheritance:
```

### TensorFlow2

```{eval-rst}
.. automodule:: xuance.tensorflow.agents.qlearning_family.ddqn_agent
:members:
:undoc-members:
:show-inheritance:
```

### MindSpore

```{eval-rst}
.. automodule:: xuance.mindspore.agents.qlearning_family.ddqn_agent
:members:
:undoc-members:
:show-inheritance:
```
26 changes: 0 additions & 26 deletions docs/source/documents/api/agents/drl/ddqn_agent.rst

This file was deleted.

8 changes: 5 additions & 3 deletions docs/source/documents/api/agents/drl/dqn_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,9 @@ Agent.finish() # Finish the training.
}
```

## API (PyTroch)
## APIs

### PyTorch

```{eval-rst}
.. automodule:: xuance.torch.agents.qlearning_family.dqn_agent
Expand All @@ -168,7 +170,7 @@ Agent.finish() # Finish the training.
:show-inheritance:
```

## API (TensorFlow2)
### TensorFlow2

```{eval-rst}
.. automodule:: xuance.tensorflow.agents.qlearning_family.dqn_agent
Expand All @@ -177,7 +179,7 @@ Agent.finish() # Finish the training.
:show-inheritance:
```

## API (MindSpore)
### MindSpore

```{eval-rst}
.. automodule:: xuance.mindspore.agents.qlearning_family.dqn_agent
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,8 @@ XuanCe contains four main parts:
:maxdepth: 1
:caption: APIs:

documents/api/environments
documents/api/agents
documents/api/environments
documents/api/runners
documents/api/representations
documents/api/policies
Expand Down

0 comments on commit 7430c7d

Please sign in to comment.