forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBREEZE.rst
2648 lines (1791 loc) · 109 KB
/
BREEZE.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. raw:: html
<div align="center">
<img src="images/AirflowBreeze_logo.png"
alt="Airflow Breeze - Development and Test Environment for Apache Airflow">
</div>
.. contents:: :local:
Airflow Breeze CI environment
=============================
Airflow Breeze is an easy-to-use development and test environment using
`Docker Compose <https://docs.docker.com/compose/>`_.
The environment is available for local use and is also used in Airflow's CI tests.
We call it *Airflow Breeze* as **It's a Breeze to contribute to Airflow**.
The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow
are described in `CONTRIBUTING.rst <CONTRIBUTING.rst#integration-test-development-environment>`_.
Prerequisites
=============
Docker Desktop
--------------
- **Version**: Install the latest stable `Docker Desktop <https://docs.docker.com/get-docker/>`_
and add make sure it is in your PATH. ``Breeze`` detects if you are using version that is too
old and warns you to upgrade.
- **Permissions**: Configure to run the ``docker`` commands directly and not only via root user.
Your user should be in the ``docker`` group.
See `Docker installation guide <https://docs.docker.com/install/>`_ for details.
- **Disk space**: On macOS, increase your available disk space before starting to work with
the environment. At least 20 GB of free disk space is recommended. You can also get by with a
smaller space but make sure to clean up the Docker disk space periodically.
See also `Docker for Mac - Space <https://docs.docker.com/docker-for-mac/space>`_ for details
on increasing disk space available for Docker on Mac.
- **Docker problems**: Sometimes it is not obvious that space is an issue when you run into
a problem with Docker. If you see a weird behaviour, try ``breeze cleanup`` command.
Also see `pruning <https://docs.docker.com/config/pruning/>`_ instructions from Docker.
- **Docker context**: In recent versions Docker Desktop is by default configured to use ``desktop-linux``
docker context that uses docker socket created in user home directory. Older versions (and plain docker)
uses ``/var/run/docker.sock`` socket and ``default`` context. Breeze will attempt to detect if you have
``desktop-linux`` context configured and will use it if it is available, but you can force the
context by adding ``--builder`` flag to the commands that build image or run the container and forward
the socket to inside the image.
Here is an example configuration with more than 200GB disk space for Docker:
.. raw:: html
<div align="center">
<img src="images/disk_space_osx.png" width="640"
alt="Disk space MacOS">
</div>
- **Docker is not running** - even if it is running with Docker Desktop. This is an issue
specific to Docker Desktop 4.13.0 (released in late October 2022). Please upgrade Docker
Desktop to 4.13.1 or later to resolve the issue. For technical details, see also
`docker/for-mac#6529 <https://github.com/docker/for-mac/issues/6529>`_.
**Docker errors that may come while running breeze**
- If docker not running in python virtual environment
- **Solution**
- 1. Create the docker group if it does not exist
- ``sudo groupadd docker``
- 2. Add your user to the docker group.
- ``sudo usermod -aG docker $USER``
- 3. Log in to the new docker group
- ``newgrp docker``
- 4. Check if docker can be run without root
- ``docker run hello-world``
- 5. In some cases you might make sure that "Allow the default Docker socket to
be used" in "Advanced" tab of "Docker Desktop" settings is checked
.. raw:: html
<div align="center">
<img src="images/docker_socket.png" width="640"
alt="Docker socket used">
</div>
Note: If you use Colima, please follow instructions at: `Contributors Quick Start Guide <https://github.com/apache/airflow/blob/main
/CONTRIBUTORS_QUICK_START.rst>`__
Docker Compose
--------------
- **Version**: Install the latest stable `Docker Compose <https://docs.docker.com/compose/install/>`_
and add it to the PATH. ``Breeze`` detects if you are using version that is too old and warns you to upgrade.
- **Permissions**: Configure permission to be able to run the ``docker-compose`` command by your user.
Docker in WSL 2
---------------
- **WSL 2 installation** :
Install WSL 2 and a Linux Distro (e.g. Ubuntu) see
`WSL 2 Installation Guide <https://docs.microsoft.com/en-us/windows/wsl/install-win10>`_ for details.
- **Docker Desktop installation** :
Install Docker Desktop for Windows. For Windows Home follow the
`Docker Windows Home Installation Guide <https://docs.docker.com/docker-for-windows/install-windows-home>`_.
For Windows Pro, Enterprise, or Education follow the
`Docker Windows Installation Guide <https://docs.docker.com/docker-for-windows/install/>`_.
- **Docker setting** :
WSL integration needs to be enabled
.. raw:: html
<div align="center">
<img src="images/docker_wsl_integration.png" width="640"
alt="Airflow Breeze - Docker WSL2 integration">
</div>
- **WSL 2 Filesystem Performance** :
Accessing the host Windows filesystem incurs a performance penalty,
it is therefore recommended to do development on the Linux filesystem.
E.g. Run ``cd ~`` and create a development folder in your Linux distro home
and git pull the Airflow repo there.
- **WSL 2 Docker mount errors**:
Another reason to use Linux filesystem, is that sometimes - depending on the length of
your path, you might get strange errors when you try start ``Breeze``, such as
``caused: mount through procfd: not a directory: unknown:``. Therefore checking out
Airflow in Windows-mounted Filesystem is strongly discouraged.
- **WSL 2 Docker volume remount errors**:
If you're experiencing errors such as ``ERROR: for docker-compose_airflow_run
Cannot create container for service airflow: not a directory`` when starting Breeze
after the first time or an error like ``docker: Error response from daemon: not a directory.
See 'docker run --help'.`` when running the pre-commit tests, you may need to consider
`installing Docker directly in WSL 2 <https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9>`_
instead of using Docker Desktop for Windows.
- **WSL 2 Memory Usage** :
WSL 2 can consume a lot of memory under the process name "Vmmem". To reclaim the memory after
development you can:
* On the Linux distro clear cached memory: ``sudo sysctl -w vm.drop_caches=3``
* If no longer using Docker you can quit Docker Desktop
(right click system try icon and select "Quit Docker Desktop")
* If no longer using WSL you can shut it down on the Windows Host
with the following command: ``wsl --shutdown``
- **Developing in WSL 2**:
You can use all the standard Linux command line utilities to develop on WSL 2.
Further VS Code supports developing in Windows but remotely executing in WSL.
If VS Code is installed on the Windows host system then in the WSL Linux Distro
you can run ``code .`` in the root directory of you Airflow repo to launch VS Code.
The pipx tool
--------------
We are using ``pipx`` tool to install and manage Breeze. The ``pipx`` tool is created by the creators
of ``pip`` from `Python Packaging Authority <https://www.pypa.io/en/latest/>`_
Install pipx
.. code-block:: bash
pip install --user pipx
Breeze, is not globally accessible until your PATH is updated. Add <USER FOLDER>\.local\bin as a variable
environments. This can be done automatically by the following command (follow instructions printed).
.. code-block:: bash
pipx ensurepath
In Mac
.. code-block:: bash
python -m pipx ensurepath
Resources required
==================
Memory
------
Minimum 4GB RAM for Docker Engine is required to run the full Breeze environment.
On macOS, 2GB of RAM are available for your Docker containers by default, but more memory is recommended
(4GB should be comfortable). For details see
`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_.
On Windows WSL 2 expect the Linux Distro and Docker containers to use 7 - 8 GB of RAM.
Disk
----
Minimum 40GB free disk space is required for your Docker Containers.
On Mac OS This might deteriorate over time so you might need to increase it or run ``breeze cleanup``
periodically. For details see
`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_.
On WSL2 you might want to increase your Virtual Hard Disk by following:
`Expanding the size of your WSL 2 Virtual Hard Disk <https://docs.microsoft.com/en-us/windows/wsl/compare-versions#expanding-the-size-of-your-wsl-2-virtual-hard-disk>`_
There is a command ``breeze ci resource-check`` that you can run to check available resources. See below
for details.
Cleaning the environment
------------------------
You may need to clean up your Docker environment occasionally. The images are quite big
(1.5GB for both images needed for static code analysis and CI tests) and, if you often rebuild/update
them, you may end up with some unused image data.
To clean up the Docker environment:
1. Stop Breeze with ``breeze down``. (If Breeze is already running)
2. Run the ``breeze cleanup`` command.
3. Run ``docker images --all`` and ``docker ps --all`` to verify that your Docker is clean.
Both commands should return an empty list of images and containers respectively.
If you run into disk space errors, consider pruning your Docker images with the ``docker system prune --all``
command. You may need to restart the Docker Engine before running this command.
In case of disk space errors on macOS, increase the disk space available for Docker. See
`Prerequisites <#prerequisites>`_ for details.
Installation
============
Set your working directory to root of (this) cloned repository.
Run this command to install Breeze (make sure to use ``-e`` flag):
.. code-block:: bash
pipx install -e ./dev/breeze
.. note:: Note for Windows users
The ``./dev/breeze`` in command about is a PATH to sub-folder where breeze source packages are.
If you are on Windows, you should use Windows way to point to the ``dev/breeze`` sub-folder
of Airflow either as absolute or relative path. For example:
.. code-block:: bash
pipx install -e dev\breeze
Once this is complete, you should have ``breeze`` binary on your PATH and available to run by ``breeze``
command.
Those are all available commands for Breeze and details about the commands are described below:
.. image:: ./images/breeze/output-commands.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output-commands.svg
:width: 100%
:alt: Breeze commands
Breeze installed this way is linked to your checked out sources of Airflow, so Breeze will
automatically use latest version of sources from ``./dev/breeze``. Sometimes, when dependencies are
updated ``breeze`` commands with offer you to run self-upgrade.
You can always run such self-upgrade at any time:
.. code-block:: bash
breeze setup self-upgrade
If you have several checked out Airflow sources, Breeze will warn you if you are using it from a different
source tree and will offer you to re-install from those sources - to make sure that you are using the right
version.
You can skip Breeze's upgrade check by setting ``SKIP_BREEZE_UPGRADE_CHECK`` variable to non empty value.
By default Breeze works on the version of Airflow that you run it in - in case you are outside of the
sources of Airflow and you installed Breeze from a directory - Breeze will be run on Airflow sources from
where it was installed.
You can run ``breeze setup version`` command to see where breeze installed from and what are the current sources
that Breeze works on
.. warning:: Upgrading from earlier Python version
If you used Breeze with Python 3.7 and when running it, it will complain that it needs Python 3.8. In this
case you should force-reinstall Breeze with ``pipx``:
.. code-block:: bash
pipx install --force -e ./dev/breeze
.. note:: Note for Windows users
The ``./dev/breeze`` in command about is a PATH to sub-folder where breeze source packages are.
If you are on Windows, you should use Windows way to point to the ``dev/breeze`` sub-folder
of Airflow either as absolute or relative path. For example:
.. code-block:: bash
pipx install --force -e dev\breeze
Running Breeze for the first time
---------------------------------
The First time you run Breeze, it pulls and builds a local version of Docker images.
It pulls the latest Airflow CI images from the
`GitHub Container Registry <https://github.com/orgs/apache/packages?repo_name=airflow>`_
and uses them to build your local Docker images. Note that the first run (per python) might take up to 10
minutes on a fast connection to start. Subsequent runs should be much faster.
Once you enter the environment, you are dropped into bash shell of the Airflow container and you can
run tests immediately.
To use the full potential of breeze you should set up autocomplete. The ``breeze`` command comes
with a built-in bash/zsh/fish autocomplete setup command. After installing,
when you start typing the command, you can use <TAB> to show all the available switches and get
auto-completion on typical values of parameters that you can use.
You should set up the autocomplete option automatically by running:
.. code-block:: bash
breeze setup autocomplete
Automating breeze installation
------------------------------
Breeze on POSIX-compliant systems (Linux, MacOS) can be automatically installed by running the
``scripts/tools/setup_breeze`` bash script. This includes checking and installing ``pipx``, setting up
``breeze`` with it and setting up autocomplete.
Customizing your environment
----------------------------
When you enter the Breeze environment, automatically an environment file is sourced from
``files/airflow-breeze-config/variables.env``.
You can also add ``files/airflow-breeze-config/init.sh`` and the script will be sourced always
when you enter Breeze. For example you can add ``pip install`` commands if you want to install
custom dependencies - but there are no limits to add your own customizations.
You can override the name of the init script by setting ``INIT_SCRIPT_FILE`` environment variable before
running the breeze environment.
You can also customize your environment by setting ``BREEZE_INIT_COMMAND`` environment variable. This variable
will be evaluated at entering the environment.
The ``files`` folder from your local sources is automatically mounted to the container under
``/files`` path and you can put there any files you want to make available for the Breeze container.
You can also copy any .whl or .sdist packages to dist and when you pass ``--use-packages-from-dist`` flag
as ``wheel`` or ``sdist`` line parameter, breeze will automatically install the packages found there
when you enter Breeze.
You can also add your local tmux configuration in ``files/airflow-breeze-config/.tmux.conf`` and
these configurations will be available for your tmux environment.
There is a symlink between ``files/airflow-breeze-config/.tmux.conf`` and ``~/.tmux.conf`` in the container,
so you can change it at any place, and run
.. code-block:: bash
tmux source ~/.tmux.conf
inside container, to enable modified tmux configurations.
Regular development tasks
=========================
The regular Breeze development tasks are available as top-level commands. Those tasks are most often
used during the development, that's why they are available without any sub-command. More advanced
commands are separated to sub-commands.
Entering Breeze shell
---------------------
This is the most often used feature of breeze. It simply allows to enter the shell inside the Breeze
development environment (inside the Breeze container).
You can use additional ``breeze`` flags to choose your environment. You can specify a Python
version to use, and backend (the meta-data database). Thanks to that, with Breeze, you can recreate the same
environments as we have in matrix builds in the CI.
For example, you can choose to run Python 3.8 tests with MySQL as backend and with mysql version 8
as follows:
.. code-block:: bash
breeze --python 3.8 --backend mysql --mysql-version 8
.. note:: Note for Windows WSL2 users
You may find error messages:
.. code-block:: bash
Current context is now "..."
protocol not available
Error 1 returned
Try adding ``--builder=default`` to your command. For example:
.. code-block:: bash
breeze --builder=default --python 3.8 --backend mysql --mysql-version 8
The choices you make are persisted in the ``./.build/`` cache directory so that next time when you use the
``breeze`` script, it could use the values that were used previously. This way you do not have to specify
them when you run the script. You can delete the ``.build/`` directory in case you want to restore the
default settings.
You can see which value of the parameters that can be stored persistently in cache marked with >VALUE<
in the help of the commands.
Building the documentation
--------------------------
To build documentation in Breeze, use the ``build-docs`` command:
.. code-block:: bash
breeze build-docs
Results of the build can be found in the ``docs/_build`` folder.
The documentation build consists of three steps:
* verifying consistency of indexes
* building documentation
* spell checking
You can choose only one stage of the two by providing ``--spellcheck-only`` or ``--docs-only`` after
extra ``--`` flag.
.. code-block:: bash
breeze build-docs --spellcheck-only
This process can take some time, so in order to make it shorter you can filter by package, using the flag
``--package-filter <PACKAGE-NAME>``. The package name has to be one of the providers or ``apache-airflow``. For
instance, for using it with Amazon, the command would be:
.. code-block:: bash
breeze build-docs --package-filter apache-airflow-providers-amazon
You can also use shorthand names as arguments instead of using the full names
for airflow providers. To find the short hand names, follow the instructions in :ref:`generating_short_form_names`.
Often errors during documentation generation come from the docstrings of auto-api generated classes.
During the docs building auto-api generated files are stored in the ``docs/_api`` folder. This helps you
easily identify the location the problems with documentation originated from.
These are all available flags of ``build-docs`` command:
.. image:: ./images/breeze/output_build-docs.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_build-docs.svg
:width: 100%
:alt: Breeze build documentation
.. _generating_short_form_names:
Generating short form names for Providers
-----------------------------------------
Skip the ``apache-airflow-providers-`` from the usual provider full names.
Now with the remaining part, replace every ``dash("-")`` with a ``dot(".")``.
Example:
If the provider name is ``apache-airflow-providers-cncf-kubernetes``, it will be ``cncf.kubernetes``.
Note: For building docs for apache-airflow-providers index, use ``providers-index`` as the short hand operator.
Running static checks
---------------------
You can run static checks via Breeze. You can also run them via pre-commit command but with auto-completion
Breeze makes it easier to run selective static checks. If you press <TAB> after the static-check and if
you have auto-complete setup you should see auto-completable list of all checks available.
For example, this following command:
.. code-block:: bash
breeze static-checks --type mypy-core
will run mypy check for currently staged files inside ``airflow/`` excluding providers.
Selecting files to run static checks on
........................................
Pre-commits run by default on staged changes that you have locally changed. It will run it on all the
files you run ``git add`` on and it will ignore any changes that you have modified but not staged.
If you want to run it on all your modified files you should add them with ``git add`` command.
With ``--all-files`` you can run static checks on all files in the repository. This is useful when you
want to be sure they will not fail in CI, or when you just rebased your changes and want to
re-run latest pre-commits on your changes, but it can take a long time (few minutes) to wait for the result.
.. code-block:: bash
breeze static-checks --type mypy-core --all-files
The above will run mypy check for all files.
You can limit that by selecting specific files you want to run static checks on. You can do that by
specifying (can be multiple times) ``--file`` flag.
.. code-block:: bash
breeze static-checks --type mypy-core --file airflow/utils/code_utils.py --file airflow/utils/timeout.py
The above will run mypy check for those to files (note: autocomplete should work for the file selection).
However, often you do not remember files you modified and you want to run checks for files that belong
to specific commits you already have in your branch. You can use ``breeze static check`` to run the checks
only on changed files you have already committed to your branch - either for specific commit, for last
commit, for all changes in your branch since you branched off from main or for specific range
of commits you choose.
.. code-block:: bash
breeze static-checks --type mypy-core --last-commit
The above will run mypy check for all files in the last commit in your branch.
.. code-block:: bash
breeze static-checks --type mypy-core --only-my-changes
The above will run mypy check for all commits in your branch which were added since you branched off from main.
.. code-block:: bash
breeze static-checks --type mypy-core --commit-ref 639483d998ecac64d0fef7c5aa4634414065f690
The above will run mypy check for all files in the 639483d998ecac64d0fef7c5aa4634414065f690 commit.
Any ``commit-ish`` reference from Git will work here (branch, tag, short/long hash etc.)
.. code-block:: bash
breeze static-checks --type identity --verbose --from-ref HEAD^^^^ --to-ref HEAD
The above will run the check for the last 4 commits in your branch. You can use any ``commit-ish`` references
in ``--from-ref`` and ``--to-ref`` flags.
These are all available flags of ``static-checks`` command:
.. image:: ./images/breeze/output_static-checks.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_static-checks.svg
:width: 100%
:alt: Breeze static checks
.. note::
When you run static checks, some of the artifacts (mypy_cache) is stored in docker-compose volume
so that it can speed up static checks execution significantly. However, sometimes, the cache might
get broken, in which case you should run ``breeze down`` to clean up the cache.
.. note::
You cannot change Python version for static checks that are run within Breeze containers.
The ``--python`` flag has no effect for them. They are always run with lowest supported Python version.
The main reason is to keep consistency in the results of static checks and to make sure that
our code is fine when running the lowest supported version.
Starting Airflow
----------------
For testing Airflow you often want to start multiple components (in multiple terminals). Breeze has
built-in ``start-airflow`` command that start breeze container, launches multiple terminals using tmux
and launches all Airflow necessary components in those terminals.
When you are starting airflow from local sources, www asset compilation is automatically executed before.
.. code-block:: bash
breeze --python 3.8 --backend mysql start-airflow
You can also use it to start different executor.
.. code-block:: bash
breeze start-airflow --executor CeleryExecutor
You can also use it to start any released version of Airflow from ``PyPI`` with the
``--use-airflow-version`` flag - useful for testing and looking at issues raised for specific version.
.. code-block:: bash
breeze start-airflow --python 3.8 --backend mysql --use-airflow-version 2.7.0
When you are installing version from PyPI, it's also possible to specify extras that should be used
when installing Airflow - you can provide several extras separated by coma - for example to install
providers together with Airflow that you are installing. For example when you are using celery executor
in Airflow 2.7.0+ you need to add ``celery`` extra.
.. code-block:: bash
breeze start-airflow --use-airflow-version 2.7.0 --executor CeleryExecutor --airflow-extras celery
These are all available flags of ``start-airflow`` command:
.. image:: ./images/breeze/output_start-airflow.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_start-airflow.svg
:width: 100%
:alt: Breeze start-airflow
Launching multiple terminals in the same environment
----------------------------------------------------
Often if you want to run full airflow in the Breeze environment you need to launch multiple terminals and
run ``airflow webserver``, ``airflow scheduler``, ``airflow worker`` in separate terminals.
This can be achieved either via ``tmux`` or via exec-ing into the running container from the host. Tmux
is installed inside the container and you can launch it with ``tmux`` command. Tmux provides you with the
capability of creating multiple virtual terminals and multiplex between them. More about ``tmux`` can be
found at `tmux GitHub wiki page <https://github.com/tmux/tmux/wiki>`_ . Tmux has several useful shortcuts
that allow you to split the terminals, open new tabs etc - it's pretty useful to learn it.
Another way is to exec into Breeze terminal from the host's terminal. Often you can
have multiple terminals in the host (Linux/MacOS/WSL2 on Windows) and you can simply use those terminals
to enter the running container. It's as easy as launching ``breeze exec`` while you already started the
Breeze environment. You will be dropped into bash and environment variables will be read in the same
way as when you enter the environment. You can do it multiple times and open as many terminals as you need.
These are all available flags of ``exec`` command:
.. image:: ./images/breeze/output_exec.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_exec.svg
:width: 100%
:alt: Breeze exec
Compiling www assets
--------------------
Airflow webserver needs to prepare www assets - compiled with node and yarn. The ``compile-www-assets``
command takes care about it. This is needed when you want to run webserver inside of the breeze.
.. image:: ./images/breeze/output_compile-www-assets.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_compile-www-assets.svg
:width: 100%
:alt: Breeze compile-www-assets
Breeze cleanup
--------------
Sometimes you need to cleanup your docker environment (and it is recommended you do that regularly). There
are several reasons why you might want to do that.
Breeze uses docker images heavily and those images are rebuild periodically and might leave dangling, unused
images in docker cache. This might cause extra disk usage. Also running various docker compose commands
(for example running tests with ``breeze testing tests``) might create additional docker networks that might
prevent new networks from being created. Those networks are not removed automatically by docker-compose.
Also Breeze uses it's own cache to keep information about all images.
All those unused images, networks and cache can be removed by running ``breeze cleanup`` command. By default
it will not remove the most recent images that you might need to run breeze commands, but you
can also remove those breeze images to clean-up everything by adding ``--all`` command (note that you will
need to build the images again from scratch - pulling from the registry might take a while).
Breeze will ask you to confirm each step, unless you specify ``--answer yes`` flag.
These are all available flags of ``cleanup`` command:
.. image:: ./images/breeze/output_cleanup.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_cleanup.svg
:width: 100%
:alt: Breeze cleanup
Running arbitrary commands in container
---------------------------------------
More sophisticated usages of the breeze shell is using the ``breeze shell`` command - it has more parameters
and you can also use it to execute arbitrary commands inside the container.
.. code-block:: bash
breeze shell "ls -la"
Those are all available flags of ``shell`` command:
.. image:: ./images/breeze/output_shell.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_shell.svg
:width: 100%
:alt: Breeze shell
Running Breeze with Metrics
---------------------------
Running Breeze with a StatsD Metrics Stack
..........................................
You can launch an instance of Breeze pre-configured to emit StatsD metrics using
``breeze start-airflow --integration statsd``. This will launch an Airflow webserver
within the Breeze environment as well as containers running StatsD, Prometheus, and
Grafana. The integration configures the "Targets" in Prometheus, the "Datasources" in
Grafana, and includes a default dashboard in Grafana.
When you run Airflow Breeze with this integration, in addition to the standard ports
(See "Port Forwarding" below), the following are also automatically forwarded:
* 29102 -> forwarded to StatsD Exporter -> breeze-statsd-exporter:9102
* 29090 -> forwarded to Prometheus -> breeze-prometheus:9090
* 23000 -> forwarded to Grafana -> breeze-grafana:3000
You can connect to these ports/databases using:
* StatsD Metrics: http://127.0.0.1:29102/metrics
* Prometheus Targets: http://127.0.0.1:29090/targets
* Grafana Dashboards: http://127.0.0.1:23000/dashboards
Running Breeze with an OpenTelemetry Metrics Stack
..................................................
----
[Work in Progress]
NOTE: This will launch the stack as described below but Airflow integration is
still a Work in Progress. This should be considered experimental and likely to
change by the time Airflow fully supports emitting metrics via OpenTelemetry.
----
You can launch an instance of Breeze pre-configured to emit OpenTelemetry metrics
using ``breeze start-airflow --integration otel``. This will launch Airflow within
the Breeze environment as well as containers running OpenTelemetry-Collector,
Prometheus, and Grafana. The integration handles all configuration of the
"Targets" in Prometheus and the "Datasources" in Grafana, so it is ready to use.
When you run Airflow Breeze with this integration, in addition to the standard ports
(See "Port Forwarding" below), the following are also automatically forwarded:
* 28889 -> forwarded to OpenTelemetry Collector -> breeze-otel-collector:8889
* 29090 -> forwarded to Prometheus -> breeze-prometheus:9090
* 23000 -> forwarded to Grafana -> breeze-grafana:3000
You can connect to these ports using:
* OpenTelemetry Collector: http://127.0.0.1:28889/metrics
* Prometheus Targets: http://127.0.0.1:29090/targets
* Grafana Dashboards: http://127.0.0.1:23000/dashboards
Running Breeze with OpenLineage
..........................................
You can launch an instance of Breeze pre-configured to emit OpenLineage metrics using
``breeze start-airflow --integration openlineage``. This will launch an Airflow webserver
within the Breeze environment as well as containers running a [Marquez](https://marquezproject.ai/)
webserver and API server.
When you run Airflow Breeze with this integration, in addition to the standard ports
(See "Port Forwarding" below), the following are also automatically forwarded:
* MARQUEZ_API_HOST_PORT (default 25000) -> forwarded to Marquez API -> marquez:5000
* MARQUEZ_API_ADMIN_HOST_PORT (default 25001) -> forwarded to Marquez Admin API -> marquez:5001
* MARQUEZ_HOST_PORT (default 23100) -> forwarded to Marquez -> marquez_web:3000
You can connect to these services using:
* Marquez Webserver: http://127.0.0.1:23100
* Marquez API: http://127.0.0.1:25000/api/v1
* Marquez Admin API: http://127.0.0.1:25001
Make sure to substitute the port numbers if you have customized them via the above env vars.
Stopping the environment
------------------------
After starting up, the environment runs in the background and takes quite some memory which you might
want to free for other things you are running on your host.
You can always stop it via:
.. code-block:: bash
breeze down
These are all available flags of ``down`` command:
.. image:: ./images/breeze/output_down.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_down.svg
:width: 100%
:alt: Breeze down
Troubleshooting
===============
If you are having problems with the Breeze environment, try the steps below. After each step you
can check whether your problem is fixed.
1. If you are on macOS, check if you have enough disk space for Docker (Breeze will warn you if not).
2. Stop Breeze with ``breeze down``.
3. Git fetch the origin and git rebase the current branch with main branch.
4. Delete the ``.build`` directory and run ``breeze ci-image build``.
5. Clean up Docker images via ``breeze cleanup`` command.
6. Restart your Docker Engine and try again.
7. Restart your machine and try again.
8. Re-install Docker Desktop and try again.
.. note::
If the pip is taking a significant amount of time and your internet connection is causing pip to be unable to download the libraries within the default timeout, it is advisable to modify the default timeout as follows and run the breeze again.
.. code-block::
export PIP_DEFAULT_TIMEOUT=1000
In case the problems are not solved, you can set the VERBOSE_COMMANDS variable to "true":
.. code-block::
export VERBOSE_COMMANDS="true"
Then run the failed command, copy-and-paste the output from your terminal to the
`Airflow Slack <https://s.apache.org/airflow-slack>`_ #airflow-breeze channel and
describe your problem.
.. warning::
Some operating systems (Fedora, ArchLinux, RHEL, Rocky) have recently introduced Kernel changes that result in
Airflow in Breeze consuming 100% memory when run inside the community Docker implementation maintained
by the OS teams.
This is an issue with backwards-incompatible containerd configuration that some of Airflow dependencies
have problems with and is tracked in a few issues:
* `Moby issue <https://github.com/moby/moby/issues/43361>`_
* `Containerd issue <https://github.com/containerd/containerd/pull/7566>`_
There is no solution yet from the containerd team, but seems that installing
`Docker Desktop on Linux <https://docs.docker.com/desktop/install/linux-install/>`_ solves the problem as
stated in `This comment <https://github.com/moby/moby/issues/43361#issuecomment-1227617516>`_ and allows to
run Breeze with no problems.
ETIMEOUT Error
--------------
When running ``breeze start-airflow``, the following output might be observed:
.. code-block:: bash
Skip fixing ownership of generated files as Host OS is darwin
Waiting for asset compilation to complete in the background.
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
The asset compilation is taking too long.
If it does not complete soon, you might want to stop it and remove file lock:
* press Ctrl-C
* run 'rm /opt/airflow/.build/www/.asset_compile.lock'
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
The asset compilation failed. Exiting.
[INFO] Locking pre-commit directory
Error 1 returned
This timeout can be increased by setting ``ASSET_COMPILATION_WAIT_MULTIPLIER`` a reasonable number
could be 3-4.
.. code-block:: bash
export ASSET_COMPILATION_WAIT_MULTIPLIER=3
This error is actually caused by the following error during the asset compilation which resulted in
ETIMEOUT when ``npm`` command is trying to install required packages:
.. code-block:: bash
npm ERR! code ETIMEDOUT
npm ERR! syscall connect
npm ERR! errno ETIMEDOUT
npm ERR! network request to https://registry.npmjs.org/yarn failed, reason: connect ETIMEDOUT 2606:4700::6810:1723:443
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly. See: 'npm help config'
In this situation, notice that the IP address ``2606:4700::6810:1723:443`` is in IPv6 format, which was the
reason why the connection did not go through the router, as the router did not support IPv6 addresses in its DNS lookup.
In this case, disabling IPv6 in the host machine and using IPv4 instead resolved the issue.
The similar issue could happen if you are behind an HTTP/HTTPS proxy and your access to required websites are
blocked by it, or your proxy setting has not been done properly.
Advanced commands
=================
Airflow Breeze is a Python script serving as a "swiss-army-knife" of Airflow testing. Under the
hood it uses other scripts that you can also run manually if you have problem with running the Breeze
environment. Breeze script allows performing the following tasks:
Running tests
-------------
You can run tests with ``breeze``. There are various tests type and breeze allows to run different test
types easily. You can run unit tests in different ways, either interactively run tests with the default
``shell`` command or via the ``testing`` commands. The latter allows to run more kinds of tests easily.
Here is the detailed set of options for the ``breeze testing`` command.
.. image:: ./images/breeze/output_testing.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing.svg
:width: 100%
:alt: Breeze testing
Iterate on tests interactively via ``shell`` command
....................................................
You can simply enter the ``breeze`` container and run ``pytest`` command there. You can enter the
container via just ``breeze`` command or ``breeze shell`` command (the latter has more options
useful when you run integration or system tests). This is the best way if you want to interactively
run selected tests and iterate with the tests. Once you enter ``breeze`` environment it is ready
out-of-the-box to run your tests by running the right ``pytest`` command (autocomplete should help
you with autocompleting test name if you start typing ``pytest tests<TAB>``).
Here are few examples:
Running single test:
.. code-block:: bash
pytest tests/core/test_core.py::TestCore::test_check_operators
To run the whole test class:
.. code-block:: bash
pytest tests/core/test_core.py::TestCore
You can re-run the tests interactively, add extra parameters to pytest and modify the files before
re-running the test to iterate over the tests. You can also add more flags when starting the
``breeze shell`` command when you run integration tests or system tests. Read more details about it
in the `testing doc <TESTING.rst>`_ where all the test types and information on how to run them are explained.
This applies to all kind of tests - all our tests can be run using pytest.
Running unit tests
..................
Another option you have is that you can also run tests via built-in ``breeze testing tests`` command.
The iterative ``pytest`` command allows to run test individually, or by class or in any other way
pytest allows to test them and run them interactively, but ``breeze testing tests`` command allows to
run the tests in the same test "types" that are used to run the tests in CI: for example Core, Always
API, Providers. This how our CI runs them - running each group in parallel to other groups and you can
replicate this behaviour.
Another interesting use of the ``breeze testing tests`` command is that you can easily specify sub-set of the
tests for Providers.
For example this will only run provider tests for airbyte and http providers: