This is a sample solution to build a safe deployment pipeline for Amazon SageMaker. This example could be useful for any organization looking to operationalize machine learning with native AWS development tools such as AWS CodePipeline, AWS CodeBuild and AWS CodeDeploy.
This solution provides as safe deployment by creating an AWS Lambda API that calls into an Amazon SageMaker Endpoint for real-time inference.
Following is a digram of the continous delivery stages in the AWS Code Pipeline.
- Build Artifacts: Runs a AWS CodeBuild job to create AWS CloudFormation templates.
- Train: Trains an Amazon SageMaker pipline and Baseline Processing Job
- Deploy Dev: Deploys a development Amazon SageMaker Endpoint
- Deploy Prod: Deploys an AWS API Gateway Lambda in front of Amazon SageMaker Endpoints using AWS CodeDeploy for blue/green deployment and rollback.
- AWS SageMaker – This solution uses SageMaker to train the model to be used and host the model at an endpoint, where it can be accessed via HTTP/HTTPS requests
- AWS CodePipeline – CodePipeline has various stages defined in CloudFormation which step through which actions must be taken in which order to go from source code to creation of the production endpoint.
- AWS CodeBuild – This solution uses CodeBuild to build the source code from GitHub
- AWS CloudFormation – This solution uses the CloudFormation Template language, in either YAML or JSON, to create each resource including custom resource.
- AWS S3 – Artifacts created throughout the pipeline as well as the data for the model is stored in an Simple Storage Service (S3) Bucket.
Following is the list of steps required to get up and running with this sample.
Create your AWS account at http://aws.amazon.com by following the instructions on the site.
Fork this GitHub Repository so that you can run with your own GitHub Auth Token.
Create your token at GitHub's Token Settings, making sure to select scopes of repo and admin:repo_hook. After clicking Generate Token, make sure to save your OAuth Token in a secure location. The token will not be shown again.
Click on the Launch Stack button below to launch the CloudFormation Stack to set up the SageMaker Pipeline. Before Launching, ensure all architecture, configuration, etc. is set as desired.
You can launch the same stack using the AWS CLI. Here's an example:
aws cloudformation create-stack --stack-name sagemaker-safe-deployment \ --template-body file://pipeline.yml \ --capabilities CAPABILITY_IAM \ --parameters \ ParameterKey=GitHubUser,[email protected] \ ParameterKey=GitHubToken,ParameterValue=YOURGITHUBTOKEN12345ab1234234 \ ParameterKey=ModelName,ParameterValue=mymodelname
Once the deployment has completed, launch the newly created SageMaker Notebook to start the build by uploading a dataset to the source S3 bucket in the code pipeline. This will kick of Model Training and Baseline and deploy a development SageMaker Endpoint. There is a manual approval step which you can action directly within the SageMaker Notebook to promote this to production, send some traffic to the live endpoint which will ensure the AWS CodeDeploy action completes succesfully. Finally the SageMaker Notebook provides the ability to retrieve the results from the Monitoring Schedule that is run on the hour.
Following is a lis of approximate running times fo the pipeline
- Full Pipeline: 45 minutes
- Start Build: 2 Minutes
- Model Training and Baseline: 5 Minutes
- Launch Dev Endpoint: 10 minutes
- Launch Prod Endpoint: 25 minutes
- Monitoring Schedule: Runs on the hour
Folling is a list of the paramters for running the cloud formation.
Parameters | Description |
---|---|
The email where CodePipeline will send SNS notifications. | |
GitHubUser | GitHub Username. |
GitHubToken | A Secret OAuthToken with access to the GitHub repo. |
GitHubRepo | The name (not URL) of the GitHub repository to pull from. |
GitHubBranch | The name (not URL) of the GitHub repository’s branch to use. |
ModelName | The short name to namespace all the mlops resources. |