Skip to content

AutoMATES CI Pipeline Details

Paul Hein edited this page Sep 21, 2021 · 3 revisions

AutoMATES CI Pipeline Details

Created by Paul Hein on: 09-21-2021

This page details the CI pipeline currently used for the AutoMATES repository. First, the technologies used and specific tests that are run are discussed. Second, the AutoMATES CI pipeline is described in detail. Third, any other important details are included. Finally, a few future enhancements are described.

Tech stack

Our CI Pipeline currently uses the following technologies:

  • GitHub Actions: allows us to specify what actions will be run on the repo code in a branch/PR/merge event. We use a YAML file to create this specification and those YAML files is the recipes for our CI pipeline.
  • Docker: We utilize docker images that contain all of the pre-configured dependencies for our repository so that we need not rebuild our dependencies during each new run of our CI pipeline.
  • DockerHub: This is a container hosting service used to host the docker images that we need to access during our CI pipeline. The lab has a DockerHub account named ml4ailab and the AutoMATES images are hosted at hub.docker.com/r/ml4ailab/automates.
  • Vanga: We utilize vanga.sista.arizona.edu to store large files (such as word embeddings) that need to be accessed by the CI pipeline for testing.
  • CodeCov: We use this to measure the coverage of our tests run during the CI pipeline. It has a simple YAML file located at automates/.codecov.yml.
  • CodeFactor: We use this to measure the code quality of the source code in our repository. It can be accessed from open PRs or from this link.

Pipeline

The YAML file that defines our CI pipeline is located at automates/.github/workflows/ci.yaml. If any other Actions pipelines are desired in the future, you can create another YAML file in the same directory location with a different name. What follows is a section-by-section description of the contents of the YAML file. This should explain everything that is in the YAML file, why it is there, and why it is needed. Note that the original ordering and indentation of contents of the YAML file has been preserved.

Header portion

name: Continuous Integration

on:
  push:
    branches: ["*"]
  pull_request:
    branches: ["*"]

Job configuration

jobs:
  continuous_integration:
    runs-on: ubuntu-20.04
    steps:
      - uses: actions/checkout@v2
      - name: Run unit tests
        env:
          DEBIAN_FRONTEND: noninteractive
        run: |

Environment variable definition

Note: the Dockerfiles for the two docker images described below are located at:

  • ml4ailab/automates:tr-pipeline: located at automates/tests/text_reading/Dockerfile
  • ml4ailab/automates:assembly-pipeline: located at automates/tests/model_assembly/Dockerfile
          CI_ENV=`bash <(curl -s https://codecov.io/env)`
          TR_IMAGE=ml4ailab/automates:tr-pipeline
          ASSEMBLY_IMAGE=ml4ailab/automates:assembly-pipeline

Initialization and testing of the TR pipeline

          sudo apt-get -y --no-install-recommends install wget
          docker pull $TR_IMAGE
          wget http://vanga.sista.arizona.edu/automates_data/vectors.txt -O $GITHUB_WORKSPACE/automates/text_reading/src/test/resources/vectors.txt
          docker run -itd --rm -v $GITHUB_WORKSPACE:/automates --name test-con $TR_IMAGE
          docker exec -w /automates/automates/text_reading test-con sbt -Dapps.projectDir=/automates test
          docker stop test-con
          docker image rm -f $TR_IMAGE

Initialization and testing of the assembly pipeline

          docker pull $ASSEMBLY_IMAGE
          docker run $CI_ENV -itd --rm -v $GITHUB_WORKSPACE:/automates --name test-con $ASSEMBLY_IMAGE
          docker exec test-con pip install -e .
          docker exec test-con make test

Reporting of code coverage results to CodeCov

          docker exec -e CODECOV_TOKEN=$CODECOV_TOKEN test-con bash -c 'bash <(curl -s https://codecov.io/bash)'

Other details

  • Sometimes it is necessary to rebuild one of the docker images in order to add new dependencies to the image before running CI tests. This can be done using the following sequence of commands:
    • Change dir to the correct directory (mentioned above)
    • Make any necessary changes to the appropriate Dockerfile
    • docker build -t ml4ailab/automates:<tag-name> .
    • docker push ml4ailab/automates:<tag-name>
  • The reason we have two separate docker images (one for the TR pipeline and one for everything else) is due to space constraints on the GH Actions (free tier) runner. Currently, we have to pull one image, run its tests, delete it, pull the other image, and run its tests in order to complete the CI process.
  • As long as Actions remain enabled, no additional setup steps or configuration is needed in the AutoMATES GitHub repo settings page.
  • Branch protections are currently enabled on master and will prevent non-admin users from pushing to master directly or merging a PR to master without passing tests and an official code review.

Future enhancements

  • Utilize GH Actions docker cache to cache the layers of our docker images. Images can then be rebuilt when requirements change, and only those layers effected will be rebuilt. The rebuilt images can then be pushed to our lab DockerHub account automatically instead of needing to be rebuilt locally whenever a change occurs. This will also prevent pulls of the docker images from occurring during each testing session.
  • Setup unit testing with a Makefile that will only run tests for portions of code where changes are present. This is a more complex change that would require some serious investment but it is possible.
  • Re-organize the AutoMATES repo into separate repos according to sub-function and create separate CI pipelines for each repo
  • Re-organize the docs CI pipeline such that it begins building useful documentation again that is generated from the source code using sphinx.