This repository contains the serverless deployment yaml for lambda functions used to achieve MLOps Level 2 within AWS. Not all resources are created by this deployment in AWS, some resources are created via Terraform in this repository terraform-aws-machine-learning-pipeline repository.
The lambda functions deployed aim to automate, preparing data and transforming features, training and tuning, deploying models and running inferences etc.
This repository is to serve as an example, the architecture proposed may not apply to all use cases due to the limitations of lambdas. Please consider other AWS services, for instance Elastic Container Service (ECS), for much better performance and longer running tasks. As the MLOps workflow has been split into various components, it should be easy to identify areas that could benefit to being moved to a different service with better compute.
- User has received new data.
- Data is uploaded to a GitHub repository.
- A GitHub action is triggered uploading the data to a S3 Bucket.
- Lambda function for data preprocessing is run due to an event trigger on the bucket.
- Upon completion the transformed data is uploaded to another bucket.
- Lambda function will trigger the SageMaker training job with various hyperparameters.
- Training job is started using data split for training and validation.
- Completed model is uploaded to a S3 Bucket.
- Lambda function to deploy the new model for inference using serverless endpoint.
- Message is sent to queue containing endpoint name and test data location.
- Lambda function will invoke serverless endpoint with test data.
- Results of predictions stored in a S3 bucket for Data scientist to examine.
The source code for all lambda functions are stored in GitHub:
- dataPreProcessing: aws-lambda-data-preprocessing
- modelTraining: aws-lambda-model-training
- modelDeployment: aws-lambda-model-deployment
- modelEvaluation: aws-lambda-model-evaluation
The GitHub Action will deploy the lambda functions using the serverless action. Docker images used for deployment are stored within an AWS ECR repository.
If you wish to apply serverless changes via your machine instead of using GitHub Actions, please ensure the relevant AWS credentials have been set as environment variables, terraform changes applied deploy, lambda versions environment variables set and serverless CLI is installed.
Note
Please see each lambdas GitHub tags to get the lambda version e.g. https://github.com/kwame-mintah/aws-lambda-data-preprocessing/tags, Would be DATA_PREPROCESSING_VERSION=X.Y.Z
.
-
Install the
serverless-iam-roles-per-function
plugin:serverless plugin install --name serverless-iam-roles-per-function
-
Deploy resources to your chosen environment e.g.
dev
,staging
orprod
:serverless deploy --stage <env>
-
Remove deployed resources in an environment e.g.
dev
,staging
orprod
:serverless remove --stage <env>