Skip to content

Commit

Permalink
Forces unistalling providers in editable mode. (apache#13439)
Browse files Browse the repository at this point in the history
We cannot skip installing providers, but this causes
problems when installing airflow in editable mode, because providers
are in two places - in airflow sources and in provider packages.

This change removes installed provider packages when airflow
is installed in editable mode to mitigate the problem.

This way, there is no need to use INSTALL_PROVIDERS_FROM_SOURCES
variable when installing in editable mode.

We still need to keep INSTALL_PROVIDERS_FROM_SOURCES for cases when
non-editable mode is used. In this way one can easily install curent
version of provider packages locally with pip install and have
the latest sources of both airflow and providers installed.

Also INSTALL_PROVIDERS_FROM_SOURCES is particularly useful if you
develop a new provider and reinstall airflow - because otherwise
it will try to install the provider from a non-existing package.

This is why all regular CI jobs and Breeze have
INSTALL_PROVIDERS_FROM_SOURCES set by default.
  • Loading branch information
potiuk authored Jan 8, 2021
1 parent 0d8536c commit f969e69
Show file tree
Hide file tree
Showing 11 changed files with 389 additions and 105 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,8 @@ jobs:
run: ./scripts/ci/provider_packages/ci_prepare_provider_packages.sh
- name: "Install and test provider packages and airflow via ${{ matrix.package-format }} files"
run: ./scripts/ci/provider_packages/ci_install_and_test_provider_packages.sh
env:
INSTALL_PROVIDERS_FROM_SOURCES: "false"
- name: "Upload package artifacts"
uses: actions/upload-artifact@v2
if: always()
Expand Down Expand Up @@ -460,6 +462,8 @@ jobs:
run: ./scripts/ci/build_airflow/ci_build_airflow_package.sh
- name: "Install and test provider packages and airflow via ${{ matrix.package-format }} files"
run: ./scripts/ci/provider_packages/ci_install_and_test_provider_packages.sh
env:
INSTALL_PROVIDERS_FROM_SOURCES: "false"
- name: "Upload package artifacts"
uses: actions/upload-artifact@v2
if: always()
Expand Down
91 changes: 90 additions & 1 deletion CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -648,7 +648,7 @@ Airflow 2.0 is split into core and providers. They are delivered as separate pac

In Airflow 1.10 all those providers were installed together within one single package and when you installed
airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out,
and not installed together with the core, unless you set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment
and not packaged together with the core, unless you set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment
variable to ``true``.

In Breeze - which is a development environment, ``INSTALL_PROVIDERS_FROM_SOURCES`` variable is set to true,
Expand Down Expand Up @@ -712,6 +712,95 @@ snowflake slack

.. END PACKAGE DEPENDENCIES HERE
Developing community managed provider packages
----------------------------------------------

While you can develop your own providers, Apache Airflow has 60+ providers that are managed by the community.
They are part of the same repository as Apache Airflow (we use ``monorepo`` approach where different
parts of the system are developed in the same repository but then they are packaged and released separately).
All the community-managed providers are in 'airflow/providers' folder and they are all sub-packages of
'airflow.providers' package. All the providers are available as ``apache-airflow-providers-<PROVIDER_ID>``
packages.

The capabilities of the community-managed providers are the same as the third-party ones. When
the providers are installed from PyPI, they provide the entry-point containing the metadata as described
in the previous chapter. However when they are locally developed, together with Airflow, the mechanism
of discovery of the providers is based on ``provider.yaml`` file that is placed in the top-folder of
the provider. Similarly as in case of the ``provider.yaml`` file is compliant with the
`json-schema specification <https://github.com/apache/airflow/blob/master/airflow/provider.yaml.schema.json>`_.
Thanks to that mechanism, you can develop community managed providers in a seamless way directly from
Airflow sources, without preparing and releasing them as packages. This is achieved by:

* When Airflow is installed locally in editable mode (``pip install -e``) the provider packages installed
from PyPI are uninstalled and the provider discovery mechanism finds the providers in the Airflow
sources by searching for provider.yaml files.

* When you want to install Airflow from sources you can set ``INSTALL_PROVIDERS_FROM_SOURCES`` variable
to ``true`` and then the providers will not be installed from PyPI packages, but they will be installed
from local sources as part of the ``apache-airflow`` package, but additionally the ``provider.yaml`` files
are copied together with the sources, so that capabilities and names of the providers can be discovered.
This mode is especially useful when you are developing a new provider, that cannot be installed from
PyPI and you want to check if it installs cleanly.

Regardless if you plan to contribute your provider, when you are developing your own, custom providers,
you can use the above functionality to make your development easier. You can add your provider
as a sub-folder of the ``airflow.providers`` package, add the ``provider.yaml`` file and install airflow
in development mode - then capabilities of your provider will be discovered by airflow and you will see
the provider among other providers in ``airflow providers`` command output.

Documentation for the community managed providers
-------------------------------------------------

When you are developing a community-managed provider, you are supposed to make sure it is well tested
and documented. Part of the documentation is ``provider.yaml`` file ``integration`` information and
``version`` information. This information is stripped-out from provider info available at runtime,
however it is used to automatically generate documentation for the provider.

If you have pre-commits installed, pre-commit will warn you and let you know what changes need to be
done in the ``provider.yaml`` file when you add a new Operator, Hooks, Sensor or Transfer. You can
also take a look at the other ``provider.yaml`` files as examples.

Well documented provider contains those:

* index.rst with references to packages, API used and example dags
* configuration reference
* class documentation generated from PyDoc in the code
* example dags
* how-to guides

You can see for example ``google`` provider which has very comprehensive documentation:

* `Documentation <docs/apache-airflow-providers-google>`_
* `Example DAGs <airflow/providers/google/cloud/example_dags>`_

Part of the documentation are example dags. We are using the example dags for various purposes in
providers:

* showing real examples of how your provider classes (Operators/Sensors/Transfers) can be used
* snippets of the examples are embedded in the documentation via ``exampleinclude::`` directive
* examples are executable as system tests

Testing the community managed providers
---------------------------------------

We have high requirements when it comes to testing the community managed providers. We have to be sure
that we have enough coverage and ways to tests for regressions before the community accepts such
providers.

* Unit tests have to be comprehensive and they should tests for possible regressions and edge cases
not only "green path"

* Integration tests where 'local' integration with a component is possible (for example tests with
MySQL/Postgres DB/Presto/Kerberos all have integration tests which run with real, dockerised components

* System Tests which provide end-to-end testing, usually testing together several operators, sensors,
transfers connecting to a real external system

You can read more about out approach for tests in `TESTING.rst <TESTING.rst>`_ but here
are some highlights.


Backport providers
------------------

Expand Down
34 changes: 27 additions & 7 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -51,24 +51,44 @@ pip install . \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"


By default `pip install` in Airflow 2.0 installs only the provider packages that are needed by the extras,
however if you want to install all providers (which was default behaviour in 1.10.*)
you can do it by setting environment variable INSTALL_PROVIDERS_FROM_SOURCES to `true`.
By default `pip install` in Airflow 2.0 installs only the provider packages that are needed by the extras and
install them as packages from PyPI rather than from local sources:


INSTALL_PROVIDERS_FROM_SOURCES="true" pip install . \
pip install . \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"


You can also install airflow in "editable mode" (with -e) flag and then provider packages will be
available, because they are used directly from the airflow sources:
You can also install airflow in "editable mode" (with -e) flag and then provider packages are
available directly from the sources (and the provider packages installed from PyPI are UNINSTALLED in
order to avoid having providers in two places. And `provider.yaml` files are used to discover capabilities
of the providers which are part of the airflow source code.

You can read more about `provider.yaml` and community-managed providers in
https://airflow.apache.org/docs/apache-airflow-providers/index.html for developing custom providers
and in ``CONTRIBUTING.rst`` for developing community maintained providers.

This is useful if you want to develop providers:

pip install -e . \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"

You can als skip installing provider packages from PyPI by setting INSTALL_PROVIDERS_FROM_SOURCE to "true".
In this case Airflow will be installed in non-editable mode with all providers installed from the sources.
Additionally `provider.yaml` files will also be copied to providers folders which will make the providers
discoverable by Airflow even if they are not installed from packages in this case.

INSTALL_PROVIDERS_FROM_SOURCES="true" pip install . \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"

Airflow can be installed with extras to install some additional features (for example 'async' or 'doc' or
to install automatically providers and all dependencies needed by that provider:

pip install .[async,google,amazon] \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"

The list of available extras:

# You can also install Airflow with extras specified. The list of available extras:
# START EXTRAS HERE

all, all_dbs, amazon, apache.atlas, apache.beam, apache.cassandra, apache.druid, apache.hdfs,
Expand Down
28 changes: 25 additions & 3 deletions LOCAL_VIRTUALENV.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ Extra Packages
``--use-deprecated legacy-resolver`` to your pip install command.



You can also install extra packages (like ``[ssh]``, etc) via
``pip install -e [EXTRA1,EXTRA2 ...]``. However, some of them may
have additional install and setup requirements for your local system.
Expand Down Expand Up @@ -135,17 +134,40 @@ To create and initialize the local virtualenv:

.. code-block:: bash
pip install -U -e ".[devel,<OTHER EXTRAS>]" # for example: pip install -U -e ".[devel,google,postgres]"
pip install --upgrade -e ".[devel,<OTHER EXTRAS>]" # for example: pip install --upgrade -e ".[devel,google,postgres]"
In case you have problems with installing airflow because of some requirements are not installable, you can
try to install it with the set of working constraints (note that there are different constraint files
for different python versions:

.. code-block:: bash
pip install -U -e ".[devel,<OTHER EXTRAS>]" \
pip install -e ".[devel,<OTHER EXTRAS>]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
This will install Airflow in 'editable' mode - where sources of Airflow are taken directly from the source
code rather than moved to the installation directory. During the installation airflow will install - but then
automatically remove all provider packages installed from PyPI - instead it will automatically use the
provider packages available in your local sources.

You can also install Airflow in non-editable mode:

.. code-block:: bash
pip install ".[devel,<OTHER EXTRAS>]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
This will copy the sources to directory where usually python packages are installed. You can see the list
of directories via ``python -m site`` command. In this case the providers are installed from PyPI, not from
sources, unless you set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment variable to ``true``

.. code-block:: bash
INSTALL_PROVIDERS_FROM_SOURCES="true" pip install ".[devel,<OTHER EXTRAS>]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
Note: when you first initialize database (the next step), you may encounter some problems.
This is because airflow by default will try to load in example dags where some of them requires dependencies ``google`` and ``postgres``.
You can solve the problem by:
Expand Down
Loading

0 comments on commit f969e69

Please sign in to comment.