Databricks MLOps Stacks

NOTE: This repository is based on https://github.com/databricks/mlops-stacks
NOTE: This feature is in public preview.

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.

Using Databricks MLOps Stacks, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML assets management, with an easy transition to production. You can also use MLOps Stacks as a building block in automation for creating new data science projects with production-grade CI/CD pre-configured.

Process

An ML solution comprises data, code, and models. These assets need to be developed, validated (staging), and deployed (production). In this repository, we use the notion of dev, staging, and prod to represent the execution environments of each stage.

An instantiated project from MLOps Stacks contains an ML pipeline with CI/CD workflows to test and deploy automated model training and batch inference jobs across your dev, staging, and prod Databricks workspaces.

Data scientists can iterate on ML code and file pull requests (PRs). This will trigger unit tests and integration tests in an isolated staging Databricks workspace. Model training and batch inference jobs in staging will immediately update to run the latest code when a PR is merged into main. After merging a PR into main, you can cut a new release branch as part of your regularly scheduled release process to promote ML code changes to production.

Step by Step

Modify code in dev branch
Commit changes to remote repository
Open PR main < dev
- Assets are deployed to TEST environment
- Execute unit and integration tests
Wait for tests to complete and approve PR
- Assets are deployed to STAGING environment
Open PR release < main
Approve PR
- Assets are deployed to PROD
Wait for assets to be deployed
Execute jobs in PROD

Set up

Install Python from https://www.anaconda.com (3.8+ / tested on 3.9.12)
Setup Databricks CLI (v0.211.0+ / tested on v0.212.0)
- Install
```
brew tap databricks/tap
brew install databricks
```
Setup your IDE of choice
- For VS Code:
  - Install from https://code.visualstudio.com/download
  - Install Python extension from https://marketplace.visualstudio.com/items?itemName=ms-python.python
  - Create a directory for your project
  - Create a new Pipenv from the project directory
```
pipenv --python <version>
```
  - Select the project Python interpreter
Setup MLOps Stacks project
- Init project
```
databricks bundle init mlops-stacks
```
- Follow on-screen instructions
Setup GitHub repository
- Create a new remote repository
- Install GIT from https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
- Initialize your local repository from the project directory
```
git init
git remote add origin <url>
git config user.name <user.name>
git config user.email <user.email>
git add *
git add .github/*
git commit -m init
git push origin main
git checkout -b dev
```
- Generate Databricks PATs for STAGING and PROD environments
- Within the GitHub repository, navigate to Settings > Secrets and variables > Actions and setup the following secrets:
  - STAGING_WORKSPACE_TOKEN
  - PROD_WORKSPACE_TOKEN
Setup Inferene Input table
- Follow steps on ./deployment/batch_inference/README.md

Customizations

Compute definitions (all-purpose cluster, cluster policy)
Schedule set to pause
Catalog and schema variables
Disabled comments on databricks-mlops-stacks-bundle-ci.yml
Added trigger conditions to CI pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
databricks_mlops_stacks		databricks_mlops_stacks
docs		docs
Pipfile		Pipfile
README.md		README.md
README2.md		README2.md
_params_testing_only.txt		_params_testing_only.txt
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks MLOps Stacks

Process

Step by Step

Set up

Customizations

About

Releases

Packages

Languages

vorodrigues/databricks-mlops-stacks

Folders and files

Latest commit

History

Repository files navigation

Databricks MLOps Stacks

Process

Step by Step

Set up

Customizations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages