This repo aims to practice implementing reinforcement learning algorithms
Build up environment
conda create -n rl python=3.9
Install some dependencies
pip install setuptools==65.5.0 "wheel<0.40.0"
Then,
pip install -r requirements.txt
Command:
python main.py [--parameters]
Example 1 (Training):
python main.py --env "CliffWalking" --agent "Sarsa" --episode 500 --render
Example 2 (Testing):
python main.py --env "CliffWalking" --agent "Sarsa" --test "./qtable_CliffWalking_Sarsa.npy"
Parameters:
- env : "FrozenLake", "CliffWalking", "GridWorld"
- agent : "Q-Learning", "Sarsa", "SarsaLambda"
- episode : How many episodes you want the agent to learn
- lr : Learning rate
- gamma : Discount rate
- lambda : Decaying rate for eligibility traces (only implemented in Sarsa lambda algorithm currently)
- epsilon : Low prob. for random action to make sure you will not only pick one action
- slippery : Only for FrozenLake and GridWorld env, default = False
- render : There will be a window show up if True, default = False
- test : Test on specific table file (input file path), default = None
Command:
python main.py [--parameters]
Example 1 (Training with default settings):
python main.py
Example 2 (Training with customized settings):
python main.py --episodes 500 --batch_size 64 --replace_iter 5 --use_pretrained --render
Example 3 (Testing):
python main.py --test "./dqn.pth" --render
Parameters:
- env : 'CartPole-v0', 'CartPole-v1'
- replay : Experience replay storage capacity
- episodes : Episodes you want the agent to learn
- batch_size : Sampled batch size for each step
- lr : Learning rate
- epsilon : Prob. for random action to make sure the agent can explore the environment
- epsilon_decay : Epsilon decay rate (for every 20 episodes)
- epsilon_min : Minimal epsilon
- gamma : Discount rate for estimating future value
- replace_iter : Update target network once every n episodes
- use_pretrained : Load pretrained weights, default = False
- render : There will be a window show up if True, default = False
- test : Test on specific policy file (input file path), default = None