In this tutorial, we’ll create a DAG in Airflow for scheduling and running a data pipeline; all from the Mage UI.
🤔 Note
This tutorial requires that you already have Airflow setup and running locally.
- Add
mage-ai
as a dependency in Airflow - Install Mage tool
- Initialize Mage project
- Create one-time DAG for pipelines
- Create pipeline
- Run DAG in Airflow for pipeline
Open the requirements.txt
file in the root directory of your Airflow project,
and add the mage-ai
library:
mage-ai
You can install and run Mage using Docker or using pip.
docker pull mageai/mageai:latest
pip install mage-ai
Change directory into your Airflow’s DAGs folder. This is typically in the folder dags/
.
cd dags
Then, initialize a new Mage project in the dags/
folder.
If you’re using Docker, run the following command in the dags/
folder:
docker run -it -p 6789:6789 -v $(pwd):/home/src \
mageai/mageai mage init demo_project
If you used pip to install Mage, run the following command in the dags/
folder:
mage init demo_project
Once finished, you should have a folder named demo_project
inside your dags/
folder.
Your current folder structure should look like this:
airflow_root_directory/
| -- dags/
| -- | -- demo_project/
In the dags/
folder, create a new file named create_mage_pipelines.py
.
Then, add the following code:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
from mage_ai.orchestration.airflow import create_dags
import os
ABSOLUTE_PATH = os.path.abspath(os.path.dirname(__file__))
project_path = os.path.join(ABSOLUTE_PATH, 'demo_project')
create_dags(
project_path,
DAG,
PythonOperator,
dag_settings=dict(
start_date=datetime(2022, 8, 5), # Change this to any start date you want
),
globals_dict=globals(),
)
In the dags/
folder, start the Mage tool.
If you’re using Docker, run the following command in the dags/
folder:
docker run -it -p 6789:6789 -v $(pwd):/home/src \
mageai/mageai mage start demo_project
If you used pip to install Mage, run the following command in the dags/
folder:
mage start demo_project
Open http://localhost:6789 in your browser.
Follow steps 1, 2, and 4 in this tutorial to create a new pipeline, add 1 data loader block, and add 1 transformer block.
- Open the Airflow webserver UI at http://localhost:8080 in your browser.
- If you named your pipeline
etl demo
based on the tutorial from the previous step, then find a DAG namedmage_pipeline_etl_demo
. If you named it something else, find a DAG with the prefixmage_pipeline_
. - Click on the DAG to view the detail page. The URL could typically be this: http://localhost:8080/admin/airflow/tree?dag_id=mage_pipeline_etl_demo.
- Turn that DAG on if its currently off.
- Trigger a new DAG run.
- Watch the DAG as it runs each task according to the pipeline you created in Mage.