Automation

Due the nature of the dataset being used, it makes sense to automate the retraining of the model as new data arrives to the data source.

We can use a cron schedule in a GitHub actions workflow and combine it with dvc exp run --set-param feature in order to update parameters on the fly.

Create and fill `.github/workflows/daily.yml`

name: Daily DVC & CML Workflow

on:
  schedule:
    - cron: "0 0 * * *"

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    container: docker://ghcr.io/iterative/cml:latest

    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0

      - name: Setup
        run: |
          pip install -r requirements.txt

      - name: Run DVC pipeline
        env:
          GITHUB_TOKEN: ${{ secrets.PERSONAL_GITHUB_TOKEN }}
          GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
        run: |
          dvc exp run --set-param data.until=$(date +'%Y/%m/%d') --pull

      - name: Share changes
        env:
          GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
        run: |
          dvc push

      - name: Create a P.R. with CML 
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          cml pr "dvc.lock" "outs/*.json" "outs/eval"  "outs/train_metrics" "params.yaml"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5-automation.md

5-automation.md

Automation

Files

5-automation.md

Latest commit

History

5-automation.md

File metadata and controls

Automation