Skip to content
Soichiro Nishimori edited this page May 26, 2024 · 22 revisions

Validity of the reported results and the details on hyperparameters.

As well known, deep offline RL algorithms are highly sensitive to hyperparameters and small details of implementations. You feel that by skimming through some papers and comparing results. Surprisingly it is known that even different DNN libraries produce different results with the code logics identical [1].

In such situation, it is difficult to ensure the same performance between different codebases. In other words, there is no such thing as the performance of CQL as a unified value. What exists is, or rather, the performance of CQL with xxx hyperparameters written in xxx. Considering this situation, we did our best to choose single reliable existing codebase for each algorithm and tried to transfer that codebase into single file with the same hyperparameters.

Here for each algorithm, we report

  • The codebase we referred to (Also in README)
  • Published paper using the codebase for baseline experiment (If exists)
  • The performance report by the paper, (If there is not, accepted report with different codebase.)

We can run the codebase we refer by ourselves, but it takes time. Furthermore, for those who would like to use jax-corl as baselines in your own research, results from published papers would be more reliable certification to use.

AWAC

  • Codebase: jaxrl
  • Paper using the codebase: Cal-QL [2]
  • Results: From Table 5 (Only mean)
ver halfcheetah-m halfcheetah-me hopper-m hopper-me walker2d-med. walker2d-me
Reference 49 72 58 30 75 86
Ours - - - - - -

CQL

  • Codebase: JaxCQL
  • Paper using the codebase: Cal-QL [2]
  • Results: From Table 5 (Only mean) |ver|halfcheetah-m|halfcheetah-me|hopper-m|hopper-me|walker2d-med.|walker2d-me| |---|---|---|---|---|---|---| |Reference|53|59|78|86|80|100| |Ours|-|-|-|-|-|-|

IQL

  • Codebase: Original
  • Paper using the codebase: TD7 [3]
  • Results: From Table 2 (Only mean)
ver halfcheetah-m halfcheetah-me hopper-m hopper-me walker2d-med. walker2d-me
Reference 47.4 89.6 63.9 64.2 84.2 108.9
Ours 43.9 89.1 46.5 52.7 77.9 109.1

TD3+BC

  • Codebase: Original
  • Paper using the codebase: TD7 [3]
  • Results: From Table 2 (Only mean)
ver halfcheetah-m halfcheetah-me hopper-m hopper-me walker2d-med. walker2d-me
Reference 48.1 93.7 59.1 98.1 84.3 110.5
Ours 48.1 93.0 46.5 105.5 72.7 109.2

DT

ver halfcheetah-m halfcheetah-me hopper-m hopper-me walker2d-med. walker2d-me
- - - - - - -
- - - - - - -
Clone this wiki locally