A deep reinforcement learning method for solving task mapping problems.
Containerized Environment (Recommended)
Ensure you meet the following system requirements:
- CUDA >= 10.2
- Docker >= 19.03
- NVIDIA Docker >= 2.0 or
nvidia-container-toolkit
Bare Metal
-
CUDA >= 10.2
-
GNU Make >= v4.1
-
CMake >= v3.8
-
Python >= v3.6.5
-
PIP >= v19.0
- PyPI packages
- numpy
- tensorflow-gpu == 1.14.0
- baseline
-
Essential libraries and utilities
$ git clone https://github.com/NTHU-LSALAB/DRL-TaskMapping.git
$ cd DRL-TaskMapping
$ git submodule update --init --recursive --progress
Build the docker image
$ bash scripts/build.sh
Extract demo train/test cases
$ tar -xf data/testcases/sample-test.tar.xz -C data/testcases
$ tar -xf data/testcases/sample-train.tar.xz -C data/testcases
Launch the container
$ bash scripts/launch.sh
In the demo, we use a MPI program to explore the communication pattern. Compile it.
$ make -C /data/src
Enter the DRL-TaskMapping and run the training script. The demo trains the model with only 1024 steps.
Modify the num_timesteps
parameter to train longer.
$ cd workspace/DRL-TaskMapping
$ bash scripts/train.sh
Run the play.sh
to do the inference, the output will be logged at logs/<num_env>/<num_eval>/<checkpoint>/runtime-*
bash scripts/play.sh
DRL-TaskMapping
├── data
│ ├── src # MPI application
│ ├── testcases # Communication pattern
│ └── xmldescs # Architecture description
├── baselines # Modified baseline library with our env
│ ├── scripts # Demo scripts
│ ├── baselines # Baselines library
│ └── ...
├── docker
│ └── Dockerfile # Dockerfile
└── scripts # Build & launch the Docker image