Learning Resources for DevOps, SRE, Cloud & Engineering Management
Simple dashboard built for viewing pipeline metrics in AWS. Built using CloudWatch dashboards and metrics populated from CloudWatch events that CodePipeline triggers. You can also deploy this dashboard directly from the AWS Serverless Application Repository here.
- From your local
pipeline-dashboard
GitHub repo, create a zip file.
zip -r pipeline-dashboard.zip *.* ./src ./test
- Upload the zip file to S3.
aws s3 mb s3://pipeline-dashboard-$(aws sts get-caller-identity --output text --query 'Account')
aws s3 sync . s3://pipeline-dashboard-$(aws sts get-caller-identity --output text --query 'Account')
- Make note of the S3 Bucket and zip file name.
- Launch the CloudFormation stack by running the command below. You will need to change the
--template-body
value to point to the location of thetemplate.yml
on your machine. You will also changeACCOUNTID
to your AWS account id.
aws cloudformation create-stack --stack-name pipeline-dashboard-stack --template-body file:///home/ec2-user/environment/pipeline-dashboard/template.yml --parameters ParameterKey=PipelinePattern,ParameterValue=* ParameterKey=BucketName,ParameterValue=pipeline-dashboard-ACCOUNTID ParameterKey=CodeKey,ParameterValue=pipeline-dashboard.zip --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND --disable-rollback
- Once the CloudFormation stack is CREATE-COMPLETE, you will need to trigger a few CodePipeline runs in order to update the CloudWatch dashboard. After these runs, go to the CloudWatch Console and click on Dashboards to see the metrics reflected in the dashboard.
As seen in the diagram below, a Lambda function is triggered from a CloudWatch Event rule for CodePipeline events. The Lambda function then generates CloudWatch metrics. The CloudWatch dashboard is then build from the metrics that the Lambda function created.
The list of pipelines in the dashboard cannot be generated dyanmically so another Lambda function runs regulary to regenerate the dashboard based on whatever metrics have been created.
Metric | Description | How to Calculate | How to Interpret |
---|---|---|---|
Cycle Time |
How often software is being delivered to production. | The mean interval of time between two consecutive successful pipeline executions. | If this number is less than Lead Time then many commits are being delivered to the pipeline before a previous commit is complete. If this number is significantly greater than Lead Time then the pipeline is delivering risky deployments due to the large batch size of the commits. |
Lead Time |
How long it takes for a change to go to production. | The mean amount of time from commit to production, including rework. | This is the number the business cares about most, as it represents how long it takes for a feature to get into the hands of the customer. If this number is too large, look at improving the availability of the pipeline (MTBF / MTBF + MTTR) . |
MTBF |
How often does the pipeline fail. | The mean interval of time between the start of a successful pipeline execution and the start of a failed pipeline execution. | This number should be high in comparison to MTTR . If this number is low, then consider improving the reliability of the pipeline by first researching if the root cause is the quality of new code being committed, or the repeatability of the infrastructure and test automation. |
MTTR |
How long does it take to fix the pipeline. | The mean interval of time between the start of a failed pipeline execution and the start of a successful pipeline execution. | This number should be low as it is a measure of a team's ability to "stop the line" when a build fails and swarm on resolving it. If the Feedback Time is high, then consider addressing that, otherwise the issue is with the team's responsiveness to failures. |
Feedback Time |
How quick can we identify failures. | The mean amount of time from commit to failure of a pipeline execution. | This number should be low as it affect MTTR . Ideally, failures would be detected as quick as possible in the pipeline, rather than finding them farther along in the pipeline. |
Cycle Time
and Lead Time
are frequently confused. For a good explanation, please see Continuous Delivery: lead time and cycle time. To compare the two metrics consider the following scenarios. Notice that Lead Time
is the same for the pipelines in both scenarios, however the cycle time is much smaller in the second scenario due to the fact that the pipelines are running in parallel (higher WIP
). This agrees with the formula Lead Time = WIP x Cycle Time
:
To run the unit tests: npm test
To deploy the CodeBuild project for staging the templates: npm run create-codebuild
or npm run update-codebuild
To deploy to your account: npm run deploy
You can change the bucket via npm config set pipeline-dashboard:staging_bucket my-bucket-name
To launch a CloudFormation stack that create a deployment pipeline which runs TaskCat test that launch other CloudFormation stacks in this repo, run the the command below. You will need to change the --template-body
value to point to the location of the pipeline-taskcat.yml
on your machine.
aws cloudformation create-stack --stack-name pipeline-dashboard-taskcat --capabilities CAPABILITY_NAMED_IAM --disable-rollback --template-body file:///home/ec2-user/environment/pipeline-dashboard/pipeline-taskcat.yml
- Go to AWS SAR Console in the production account and click on pipeline-dashboard.
- Click on Publish new version.
- Enter value for Semantic version.
- Enter
https://github.com/stelligent/pipeline-dashboard
for Source code URL. - For the SAM template, Browse for template-sar.yml from this repo and click the Publish Version button.
- Go to the AWS Lambda Console on a separate AWS account and when creating a function, click on the Serverless Application Repository radio button and find
pipeline-dashboard
. - Deploy the application.
- Once it is complete, go to the Amazon CloudWatch Console and choose Dashboards to verify it is working.