The goal of this project is to create an ecosystem where to run Data Pipelines and monitor Machine Learning Experiments.
From Airflow
documentation:
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows
From MLflow
documentation:
MLflow is an open source platform for managing the end-to-end machine learning lifecycle
From Docker
documentation:
Docker Compose is a tool for defining and running multi-container Docker applications.
The first step to structure this project is connecting Airflow
and MLflow
together: docker compose
.
Create docker-compose.yaml
, which contains the configuration of those docker containers responsible for running Airflow
and MLflow
services.
Each of those services runs on a different container:
- airflow-webserver
- airflow-scheduler
- airflow-worker
- airflow-triggerer
- mlflow
To create and start multiple container, from terminal run the following command:
docker compose up -d
In order to access to Airflow server
visit the page: localhost:8080
And take a step into Airflow
world!
To start creating DAGS initialize an empty folder named dags
and populate it with as many scripts as you need.
└── dags
└── example_dag.py
In order to monitor MLflow experiments
through its server, visit the page: localhost:600
To establish a connection between Airflow
and MLflow
, define the URI of the MLflow server
:
mlflow.set_tracking_uri('http://mlflow:600')
After that, create a new connection on Airflow
that points to that port.