-
Notifications
You must be signed in to change notification settings - Fork 9
AutoMATES CI Pipeline Details
Created by Paul Hein on: 09-21-2021
This page details the CI pipeline currently used for the AutoMATES repository. First, the technologies used and specific tests that are run are discussed. Second, the AutoMATES CI pipeline is described in detail. Third, any other important details are included. Finally, a few future enhancements are described.
Our CI Pipeline currently uses the following technologies:
- GitHub Actions: allows us to specify what actions will be run on the repo code in a branch/PR/merge event. We use a YAML file to create this specification and those YAML files is the recipes for our CI pipeline.
- Docker: We utilize docker images that contain all of the pre-configured dependencies for our repository so that we need not rebuild our dependencies during each new run of our CI pipeline.
-
DockerHub: This is a container hosting service used to host the docker images that we need to access during our CI pipeline. The lab has a DockerHub account named
ml4ailab
and the AutoMATES images are hosted at hub.docker.com/r/ml4ailab/automates. -
Vanga: We utilize
vanga.sista.arizona.edu
to store large files (such as word embeddings) that need to be accessed by the CI pipeline for testing. -
CodeCov: We use this to measure the coverage of our tests run during the CI pipeline. It has a simple YAML file located at
automates/.codecov.yml
. - CodeFactor: We use this to measure the code quality of the source code in our repository. It can be accessed from open PRs or from this link.
The YAML file that defines our CI pipeline is located at automates/.github/workflows/ci.yaml
. If any other Actions pipelines are desired in the future, you can create another YAML file in the same directory location with a different name. What follows is a section-by-section description of the contents of the YAML file. This should explain everything that is in the YAML file, why it is there, and why it is needed. Note that the original ordering and indentation of contents of the YAML file has been preserved.
name: Continuous Integration
on:
push:
branches: ["*"]
pull_request:
branches: ["*"]
jobs:
continuous_integration:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- name: Run unit tests
env:
DEBIAN_FRONTEND: noninteractive
run: |
Note: the Dockerfiles for the two docker images described below are located at:
-
ml4ailab/automates:tr-pipeline
: located atautomates/tests/text_reading/Dockerfile
-
ml4ailab/automates:assembly-pipeline
: located atautomates/tests/model_assembly/Dockerfile
CI_ENV=`bash <(curl -s https://codecov.io/env)`
TR_IMAGE=ml4ailab/automates:tr-pipeline
ASSEMBLY_IMAGE=ml4ailab/automates:assembly-pipeline
sudo apt-get -y --no-install-recommends install wget
docker pull $TR_IMAGE
wget http://vanga.sista.arizona.edu/automates_data/vectors.txt -O $GITHUB_WORKSPACE/automates/text_reading/src/test/resources/vectors.txt
docker run -itd --rm -v $GITHUB_WORKSPACE:/automates --name test-con $TR_IMAGE
docker exec -w /automates/automates/text_reading test-con sbt -Dapps.projectDir=/automates test
docker stop test-con
docker image rm -f $TR_IMAGE
docker pull $ASSEMBLY_IMAGE
docker run $CI_ENV -itd --rm -v $GITHUB_WORKSPACE:/automates --name test-con $ASSEMBLY_IMAGE
docker exec test-con pip install -e .
docker exec test-con make test
docker exec -e CODECOV_TOKEN=$CODECOV_TOKEN test-con bash -c 'bash <(curl -s https://codecov.io/bash)'
- Sometimes it is necessary to rebuild one of the docker images in order to add new dependencies to the image before running CI tests. This can be done using the following sequence of commands:
- Change dir to the correct directory (mentioned above)
- Make any necessary changes to the appropriate Dockerfile
docker build -t ml4ailab/automates:<tag-name> .
docker push ml4ailab/automates:<tag-name>
- The reason we have two separate docker images (one for the TR pipeline and one for everything else) is due to space constraints on the GH Actions (free tier) runner. Currently, we have to pull one image, run its tests, delete it, pull the other image, and run its tests in order to complete the CI process.
- As long as Actions remain enabled, no additional setup steps or configuration is needed in the AutoMATES GitHub repo settings page.
- Branch protections are currently enabled on master and will prevent non-admin users from pushing to master directly or merging a PR to master without passing tests and an official code review.
- Utilize GH Actions docker cache to cache the layers of our docker images. Images can then be rebuilt when requirements change, and only those layers effected will be rebuilt. The rebuilt images can then be pushed to our lab DockerHub account automatically instead of needing to be rebuilt locally whenever a change occurs. This will also prevent pulls of the docker images from occurring during each testing session.
- Setup unit testing with a Makefile that will only run tests for portions of code where changes are present. This is a more complex change that would require some serious investment but it is possible.
- Re-organize the AutoMATES repo into separate repos according to sub-function and create separate CI pipelines for each repo
- Re-organize the docs CI pipeline such that it begins building useful documentation again that is generated from the source code using sphinx.