A set of Dockerfiles that enables Reinforcement Learning (RL) solutions to be used in SageMaker.
The SageMaker team uses this repository to build its official RL images. On how to use any of these images on SageMaker, see Python SDK. For end users, this repository is typically of interest if you need implementation details of the official image, or if you want to use it to build your own customized RL image.
For information on running RL jobs on SageMaker: SageMaker RLEstimators.
For notebook examples: SageMaker Notebook Examples.
Make sure you have installed all of the following prerequisites on your development machine:
- A Python environment management tool (e.g. PyEnv, VirtualEnv)
Amazon SageMaker utilizes Docker containers to run all training jobs and inference endpoints.
The Docker images are built from the Dockerfiles specified in coach/docker and ray/docker.
The Docker files are grouped based on RL toolkit (Coach or Ray), toolkit version and separated
based on framework, e.g.: coach/docker/0.11.0/Dockerfile.mxnet
.
All Dockerfiles use deep learning framework images provided by SageMaker as their "base" images.
These "base" images are specified with the following naming convention:
520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-<framework>:<framework_version>-<processor>-py3
<framework>
can betensorflow-scriptmode
(with<framework_version>
1.11.0
or higher depending on the toolkit requirements) ormxnet
(with<framework_version>
1.3.0
or higher depending on the toolkit requirements);<processor>
can becpu
orgpu
;- for valid
<region>
values please see list of supported SageMaker regions.
Before building images:
Pull deep learning framework "base" image, which require Docker, AWS credentials, and AWS CLI.
# Login into SageMaker ECR account $(aws ecr get-login --no-include-email --region <region> --registry-ids 520713654638) # Pull docker image from ECR docker pull 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-<framework>:<framework_version>-<processor>-py3
# Example $(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 520713654638) # CPU TensorFlow image docker pull 520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-scriptmode:1.11.0-cpu-py3 # GPU MXNet image docker pull 520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet:1.3.0-gpu-py3
To build RL Docker image:
# All build instructions assume you're building from the root directory of the sagemaker-rl-container. # CPU docker build -t <image_name>:<tag> -f <rl_toolkit>docker/<rl_toolkit_version>/Dockerfile.<framework> --build-arg processor=<cpu_or_gpu> . # GPU docker build -t <image_name>:<tag> -f <rl_toolkit>/docker/<rl_toolkit_version>/Dockerfile.<framework> --build-arg processor=<cpu_or_gpu> .
# Example # Ray TensorFlow CPU docker build -t tf-ray:0.6.5-cpu-py3 -f ray/docker/0.6.5/Dockerfile.tf --build-arg processor=cpu . # Coach TensorFlow GPU docker build -t tf-coach:0.11.0-gpu-py3 -f coach/docker/0.11.0/Dockerfile.tf --build-arg processor=gpu . # Coach MXNet CPU docker build -t mxnet-coach:0.11.0-cpu-py3 -f coach/docker/0.11.0/Dockerfile.mxnet --build-arg processor=cpu .
Running the tests requires installation of test dependencies.
git clone https://github.com/aws/sagemaker-rl-container.git cd sagemaker-rl-container pip install .
Tests are defined in test/ and include local integration and SageMaker integration tests.
Running local integration tests require Docker and AWS credentials, as the local integration tests make calls to a couple of AWS services. The local integration tests and SageMaker integration tests require configurations specified within their respective conftest.py.
Local integration tests on GPU require Nvidia-Docker.
Before running local integration tests:
- Build your Docker image.
- Pass in the correct pytest arguments to run tests against your Docker image.
If you want to run local integration tests, then use:
# Required arguments for integration tests are found in test/conftest.py pytest test/integration/local --toolkit <toolkit_to_run_tests_for> \ --docker-base-name <your_docker_image> \ --tag <your_docker_image_tag> \ --processor <cpu_or_gpu>
# Example pytest test/integration/local --toolkit coach \ --docker-base-name custom-rl-coach-image \ --tag 1.0 \ --processor cpu
SageMaker integration tests require your Docker image to be within an Amazon ECR repository.
The Docker base name is your ECR repository namespace.
The instance type is your specified Amazon SageMaker Instance Type that the SageMaker integration test will run on.
Before running SageMaker integration tests:
- Build your Docker image.
- Push the image to your ECR repository.
- Pass in the correct pytest arguments to run tests on SageMaker against the image within your ECR repository.
If you want to run a SageMaker integration end to end test on Amazon SageMaker, then use:
# Required arguments for integration tests are found in test/conftest.py pytest test/integration/sagemaker --toolkit <toolkit_to_run_tests_for> \ --aws-id <your_aws_id> \ --docker-base-name <your_docker_image> \ --instance-type <amazon_sagemaker_instance_type> \ --tag <your_docker_image_tag> \
# Example pytest test/integration/sagemaker --toolkit coach \ --aws-id 12345678910 \ --docker-base-name custom-rl-coach-image \ --instance-type ml.m4.xlarge \ --tag 1.0
MXNet Coach Images:
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11.0-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11-gpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11.0-gpu-py3
TensorFlow Coach Images:
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10.1-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10-gpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10.1-gpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.0-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11-gpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.0-gpu-py3
TensorFlow Ray Images:
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-cpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6-gpu-py3
- 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-gpu-py3
List of supported SageMaker regions.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This library is licensed under the Apache 2.0 License.