diff --git a/README.md b/README.md index 36f74e2b..9242fb4f 100644 --- a/README.md +++ b/README.md @@ -44,17 +44,17 @@ docker run --gpus=all -it --rm --name | Algorithm | Variants Implemented | Wandb Report | |--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| ----------- | | **Offline and Offline-to-Online** | | -| ✅ [Conservative Q-Learning for Offline Reinforcement Learning
(CQL)](https://arxiv.org/abs/2006.04779) | [`offline/cql.py`](algorithms/offline/cql.py)
[`finetune/cql.py`](algorithms/finetune/cql.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-CQL--VmlldzoyNzA2MTk5)

[`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-CQL--Vmlldzo0NTQ3NTMz) -| ✅ [Accelerating Online Reinforcement Learning with Offline Datasets
(AWAC)](https://arxiv.org/abs/2006.09359) | [`offline/awac.py`](algorithms/offline/awac.py)
[`finetune/awac.py`](algorithms/finetune/awac.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-AWAC--VmlldzoyNzA2MjE3)

[`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-AWAC--VmlldzozODAyNzQz) -| ✅ [Offline Reinforcement Learning with Implicit Q-Learning
(IQL)](https://arxiv.org/abs/2110.06169) | [`offline/iql.py`](algorithms/offline/iql.py)
[`finetune/iql.py`](algorithms/finetune/iql.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-IQL--VmlldzoyNzA2MTkx)

[`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-IQL--VmlldzozNzE1MTEy) +| ✅ [Conservative Q-Learning for Offline Reinforcement Learning
(CQL)](https://arxiv.org/abs/2006.04779) | [`offline/cql.py`](algorithms/offline/cql.py)
[`finetune/cql.py`](algorithms/finetune/cql.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-CQL--Vmlldzo1MzM4MjY3)

[`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-CQL--Vmlldzo0NTQ3NTMz) +| ✅ [Accelerating Online Reinforcement Learning with Offline Datasets
(AWAC)](https://arxiv.org/abs/2006.09359) | [`offline/awac.py`](algorithms/offline/awac.py)
[`finetune/awac.py`](algorithms/finetune/awac.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-AWAC--Vmlldzo1MzM4MTEy)

[`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-AWAC--VmlldzozODAyNzQz) +| ✅ [Offline Reinforcement Learning with Implicit Q-Learning
(IQL)](https://arxiv.org/abs/2110.06169) | [`offline/iql.py`](algorithms/offline/iql.py)
[`finetune/iql.py`](algorithms/finetune/iql.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-IQL--Vmlldzo1MzM4MzQz)

[`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-IQL--VmlldzozNzE1MTEy) | **Offline-to-Online only** | | | ✅ [Supported Policy Optimization for Offline Reinforcement Learning
(SPOT)](https://arxiv.org/abs/2202.06239) | [`finetune/spot.py`](algorithms/finetune/spot.py) | [`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-SPOT--VmlldzozODk5MTgx) | ✅ [Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
(Cal-QL)](https://arxiv.org/abs/2303.05479) | [`finetune/cal_ql.py`](algorithms/finetune/cal_ql.py) | [`Offline-to-online`](https://wandb.ai/tlab/CORL/reports/-Offline-to-Online-Cal-QL--Vmlldzo0NTQ3NDk5) | **Offline only** | | -| ✅ Behavioral Cloning
(BC) | [`offline/any_percent_bc.py`](algorithms/offline/any_percent_bc.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-BC--VmlldzoyNzA2MjE1) +| ✅ Behavioral Cloning
(BC) | [`offline/any_percent_bc.py`](algorithms/offline/any_percent_bc.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-BC--Vmlldzo1MzM4MzU2) | ✅ Behavioral Cloning-10%
(BC-10%) | [`offline/any_percent_bc.py`](algorithms/offline/any_percent_bc.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-BC-10---VmlldzoyNzEwMjcx) -| ✅ [A Minimalist Approach to Offline Reinforcement Learning
(TD3+BC)](https://arxiv.org/abs/2106.06860) | [`offline/td3_bc.py`](algorithms/offline/td3_bc.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-TD3-BC--VmlldzoyNzA2MjA0) -| ✅ [Decision Transformer: Reinforcement Learning via Sequence Modeling
(DT)](https://arxiv.org/abs/2106.01345) | [`offline/dt.py`](algorithms/offline/dt.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-Decision-Transformer--VmlldzoyNzA2MTk3) +| ✅ [A Minimalist Approach to Offline Reinforcement Learning
(TD3+BC)](https://arxiv.org/abs/2106.06860) | [`offline/td3_bc.py`](algorithms/offline/td3_bc.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-TD3-BC--Vmlldzo1MzM4MDA2) +| ✅ [Decision Transformer: Reinforcement Learning via Sequence Modeling
(DT)](https://arxiv.org/abs/2106.01345) | [`offline/dt.py`](algorithms/offline/dt.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-Decision-Transformer--Vmlldzo1MzM3OTkx) | ✅ [Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
(SAC-N)](https://arxiv.org/abs/2110.01548) | [`offline/sac_n.py`](algorithms/offline/sac_n.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-SAC-N--VmlldzoyNzA1NTY1) | ✅ [Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
(EDAC)](https://arxiv.org/abs/2110.01548) | [`offline/edac.py`](algorithms/offline/edac.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-EDAC--VmlldzoyNzA5ODUw) | ✅ [Revisiting the Minimalist Approach to Offline Reinforcement Learning
(ReBRAC)](https://arxiv.org/abs/2305.09836) | [`offline/rebrac.py`](algorithms/offline/rebrac.py) | [`Offline`](https://wandb.ai/tlab/CORL/reports/-Offline-ReBRAC--Vmlldzo0ODkzOTQ2) diff --git a/docs/algorithms/cql.md b/docs/algorithms/cql.md index cdc9a867..fafa078c 100644 --- a/docs/algorithms/cql.md +++ b/docs/algorithms/cql.md @@ -82,7 +82,7 @@ Possible extensions: For detailed scores on all benchmarked datasets see [benchmarks section](../benchmarks/offline.md). Reports visually compare our reproduction results with original paper scores to make sure our implementation is working properly. - + @@ -239,4 +239,4 @@ TrainConfig: --project str wandb project name (default: CORL) --group str wandb group name (default: CQL-D4RL) --name str wandb run name (default: CQL) -``` \ No newline at end of file +``` diff --git a/docs/algorithms/dt.md b/docs/algorithms/dt.md index e3218c9d..5d4f927e 100644 --- a/docs/algorithms/dt.md +++ b/docs/algorithms/dt.md @@ -70,7 +70,7 @@ We'd be glad if someone would be interested in contributing them! For detailed scores on all benchmarked datasets see [benchmarks section](../benchmarks/offline.md). Reports visually compare our reproduction results with original paper scores to make sure our implementation is working properly. - + ## Training options