Skip to content

Commit

Permalink
📝 Documentation refactoring for readibility and up-to-dateness
Browse files Browse the repository at this point in the history
Update doc

up doc

fix tests

update doc

huge doc refactoring

doc refactoring

doc refactoring

finish doc

update doc
  • Loading branch information
Galileo-Galilei committed Jan 28, 2025
1 parent 3555900 commit d0fd002
Show file tree
Hide file tree
Showing 53 changed files with 445 additions and 258 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

### Fixed

- :bug: :ambulance: Ensure `MlflowArtifactDataset` logs in the same run that parameters to when using `mlflow>=2.18` in combination with `ThreadRunner` [#613](https://github.com/Galileo-Galilei/kedro-mlflow/issues/613))
- :bug: :ambulance: Ensure `MlflowArtifactDataset` logs in the same run that parameters to when using `mlflow>=2.18` in combination with `ThreadRunner` ([#613](https://github.com/Galileo-Galilei/kedro-mlflow/issues/613))

## [0.13.3] - 2024-10-29

Expand Down
23 changes: 10 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,27 +30,24 @@

**Important: ``kedro-mlflow`` is only compatible with ``kedro>=0.16.0`` and ``mlflow>=1.0.0``. If you have a project created with an older version of ``Kedro``, see this [migration guide](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md#migration-guide-from-kedro-015-to-016).**

``kedro-mlflow`` is available on PyPI, so you can install it with ``pip``:

```console
pip install kedro-mlflow
```
You can install ``kedro-mlflow`` with several tools and packaging platforms:

If you want to use the most up to date version of the package which is under development and not released yet, you can install the package from github:
| **Logo** | **Platform** |**Command**|
|:-----------------------------------------------------------------:|:------------:|:----------------------------------------------------:|
| ![PyPI logo](https://simpleicons.org/icons/pypi.svg) | PyPI | ``pip install kedro-mlflow`` |
| ![Conda Forge logo](https://simpleicons.org/icons/condaforge.svg) | Conda Forge | ``conda install kedro-mlflow --channel conda-forge`` |
| ![GitHub logo](https://simpleicons.org/icons/github.svg) | GitHub | ``pip install --upgrade git+https://github.com/Galileo-Galilei/kedro-mlflow.git`` |

```console
pip install --upgrade git+https://github.com/Galileo-Galilei/kedro-mlflow.git
```

I strongly recommend to use ``conda`` (a package manager) to create an environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/latest/get_started/install.html).
I strongly recommend to use ``conda`` (a package manager) to create a virtual environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/latest/get_started/install.html).

# Getting started

The documentation contains:

- [A "hello world" example](https://kedro-mlflow.readthedocs.io/en/latest/source/03_getting_started/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
- A section for [advanced machine learning versioning](https://kedro-mlflow.readthedocs.io/en/latest/source/04_experimentation_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
- A section to demonstrate how to use `kedro-mlflow` as a [machine learning framework](https://kedro-mlflow.readthedocs.io/en/latest/source/05_framework_ml/index.html) to deliver production ready pipelines and serve them. This section comes with [an example repo](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial) you can clone and try out.
- [A quickstart in 1 mn example](https://kedro-mlflow.readthedocs.io/en/latest/source/03_quickstart/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
- A section for [advanced machine learning versioning](https://kedro-mlflow.readthedocs.io/en/latest/source/10_experiment_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
- A section to demonstrate how to use `kedro-mlflow` as a [machine learning framework](https://kedro-mlflow.readthedocs.io/en/latest/source/21_pipeline_serving/index.html) to deliver production ready pipelines and serve them. This section comes with [an example repo](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial) you can clone and try out.

Some frequently asked questions on more advanced features:

Expand Down
30 changes: 21 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,33 @@ Welcome to kedro-mlflow's documentation!
========================================

.. toctree::
:maxdepth: 6
:maxdepth: -1
:caption: Getting started

Introduction <source/01_introduction/index.rst>
Installation <source/02_installation/index.rst>
Getting Started <source/03_getting_started/index.rst>
Experimentation tracking <source/04_experimentation_tracking/index.rst>
Pipeline serving <source/05_pipeline_serving/index.rst>
A mlops framework for continuous model serving <source/05_framework_ml/index.rst>
Interactive use <source/06_interactive_use/index.rst>
Python objects <source/07_python_objects/index.rst>
Quickstart in 1 mn <source/03_quickstart/index.rst>

.. toctree::
:maxdepth: 6
:maxdepth: -1
:caption: Experiment tracking

API documentation <source/08_API/kedro_mlflow.rst>
In a kedro project <source/10_experiment_tracking/index.rst>
In a notebook <source/11_interactive_use/index.rst>

.. toctree::
:maxdepth: -1
:caption: Pipeline serving

Custom mlflow model for kedro pipelines <source/21_pipeline_serving/index.rst>
A mlops framework for continuous model serving <source/22_framework_ml/index.rst>

.. toctree::
:maxdepth: -1
:caption: Technical documentation

Python objects <source/30_python_objects/index.rst>
API documentation <source/31_API/kedro_mlflow.rst>

Indices and tables
==================
Expand Down
28 changes: 13 additions & 15 deletions docs/source/01_introduction/01_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ While ``Kedro`` and ``Mlflow`` do not compete in the same field, they provide so
| I/O configuration files | - ``catalog.yml`` <br> - ``parameters.yml`` | ``MLproject`` |
| Compute abstraction | - ``Pipeline`` <br> - ``Node`` | N/A |
| Compute configuration files | - ``hooks.py`` <br> - ``run.py`` | `MLproject` |
| Parameters and data versioning | - ``Journal`` <br> - ``AbstractVersionedDataset`` | - ``log_metric``<br> - ``log_artifact``<br> - ``log_param`` |
| Parameters and data versioning | - ``Journal`` (deprecated) <br> - Experiment tracking (deprecated) <br> - ``AbstractVersionedDataset`` | - ``log_metric``<br> - ``log_artifact``<br> - ``log_param``|
| Cli execution | command ``kedro run`` | command ``mlflow run`` |
| Code packaging | command ``kedro package`` | N/A |
| Model packaging | N/A | - ``Mlflow Models`` (``mlflow.XXX.log_model`` functions) <br> - ``Mlflow Flavours`` |
Expand All @@ -39,23 +39,17 @@ We discuss hereafter how the two libraries compete on the different functionalit

``Mlflow`` and ``Kedro`` are essentially overlapping on the way they offer a dedicated configuration files for running the pipeline from CLI. However:

- ``Mlflow`` provides a single configuration file (the ``MLProject``) where all elements are declared (data, parameters and pipelines). Its goal is mainly to enable CLI execution of the project, but it is not very flexible. In my opinion, this file is **production oriented** and is not really intended to use for exploration.
- ``Mlflow`` provides a single configuration file (the ``MLProject``) where all elements are declared (data, parameters and pipelines). Its goal is mainly to enable CLI execution of the project, but it is not very flexible. This file is **production oriented** and is not really intended to use for and development.
- ``Kedro`` offers a bunch of files (``catalog.yml``, ``parameters.yml``, ``pipeline.py``) and their associated abstraction (``AbstractDataset``, ``DataCatalog``, ``Pipeline`` and ``node`` objects). ``Kedro`` is much more opinionated: each object has a dedicated place (and only one!) in the template. This makes the framework both **exploration and production oriented**. The downside is that it could make the learning curve a bit sharper since a newcomer has to learn all ``Kedro`` specifications. It also provides a ``kedro-viz`` plugin to visualize the DAG interactively, which is particularly handy in medium-to-big projects.


> **``Kedro`` is a clear winner here, since it provides more functionnalities than ``Mlflow``. It handles very well _by design_ the exploration phase of data science projects when Mlflow is less flexible.**
```{note}
**``Kedro`` is a clear winner here, since it provides more functionnalities than ``Mlflow``. It handles very well _by design_ the exploration phase of data science projects when Mlflow is less flexible.**
```

### Versioning: Kedro 1 - 1 Mlflow

** This section will be updated soon with the brand new experiment tracking functionality of kedro**

The ``Kedro`` ``Journal`` aimed at reproducibility (it was removed in ``kedro==0.18``), but is not focused on machine learning. The `Journal` keeps track of two elements:

- the CLI arguments, including *on the fly* parameters. This makes the command used to run the pipeline fully reproducible.
- the ``AbstractVersionedDataset`` for which versioning is activated. It consists in copying the data whom ``versioned`` argument is ``True`` when the ``save`` method of the ``AbstractVersionedDataset`` is called.
This approach suffers from two main drawbacks:
- the configuration is assumed immutable (including parameters), which is not realistic ni machine learning projects where they are very volatile. To fix this, the ``git sha`` has been recently added to the ``Journal``, but it has still some bugs in my experience (including the fact that the current ``git sha`` is logged even if the pipeline is ran with uncommitted change, which prevents reproducibility). This is still recent and will likely evolve in the future.
- there is no support for browsing old runs, which prevents [cleaning the database with old and unused datasets](https://github.com/quantumblacklabs/kedro/issues/406), compare runs between each other...
Kedro ahas made a bunch of attempts in the world of experiment tracking, with the ``Journal`` in early days (``kedro<=0.18``), then with an [experiment tracking functionality](https://docs.kedro.org/projects/kedro-viz/en/v9.2.0/experiment_tracking.html) which kept track of the parameters but which will be removed in ``kedro>=0.20`` due to the lack of traction (https://github.com/kedro-org/kedro-viz/issues/2202).

On the other hand, ``Mlflow``:

Expand All @@ -64,7 +58,9 @@ On the other hand, ``Mlflow``:
- [comes with a *User Interface* (UI)](https://mlflow.org/docs/latest/tracking.html#id7) which enable to browse / filter / sort the runs, display graphs of the metrics, render plots... This make the run management much easier than in ``Kedro``.
- has a command to reproduce exactly the run from a given ``git sha``, [which is not possible in ``Kedro``](https://github.com/quantumblacklabs/kedro/issues/297).

> **``Mlflow`` is a clear winner here, because _UI_ and _run querying_ are must-have for machine learning projects. It is more mature than ``Kedro`` for versioning and more focused on machine learning.**
```{note}
**``Mlflow`` is a clear winner here, because _UI_ and _run querying_ are must-have for machine learning projects. It is more mature than ``Kedro`` for versioning and more focused on machine learning.**
```

### Model packaging and service: Kedro 1 - 2 Mlflow

Expand All @@ -79,8 +75,10 @@ On the other hand, ``Mlflow``:

When a stored model meets these requirements, ``Mlflow`` provides built-in tools to serve the model (as an API or for batch prediction) on many machine learning tools (Microsoft Azure ML, Amazon Sagemaker, Apache SparkUDF) and locally.

> **``Mlflow`` is currently the only tool which adresses model serving. This is currently not the top priority for ``Kedro``, but may come in the future ([through Kedro Server maybe?](https://github.com/quantumblacklabs/kedro/issues/143))**
```{note}
``Mlflow`` is currently the only tool which adresses model serving. Some [plugins address model deployment and serving](https://docs.kedro.org/en/stable/extend_kedro/plugins.html#community-developed-plugins) in the Kedro ecosystem, but they are not as well maintained as the core framework.
```

### Conclusion: Use Kedro and add Mlflow for machine learning projects

In my opinion, ``Kedro``'s will to enforce software engineering best practice makes it really useful for machine learning teams. It is extremely well documented and the support is excellent, which makes it very user friendly even for people with no computer science background. However, it lacks some machine learning-specific functionalities (better versioning, model service), and it is where ``Mlflow`` fills the gap.
``Kedro``'s will to enforce software engineering best practice makes it really useful for machine learning teams. It is extremely well documented and the support is excellent, which makes it very user friendly even for people with no computer science background. However, it lacks some machine learning-specific functionalities (better versioning, model service), and it is where ``Mlflow`` fills the gap.
2 changes: 1 addition & 1 deletion docs/source/01_introduction/02_motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Basically, you should use `kedro-mlflow` in **any `Kedro` project which involves machine learning** / deep learning. As stated in the [introduction](./01_introduction.md), `Kedro`'s current versioning (as of version `0.16.6`) is not sufficient for machine learning projects: it lacks a UI and a ``run`` management system. Besides, the `KedroPipelineModel` ability to serve a kedro pipeline as an API or a batch in one line of code is a great addition for collaboration and transition to production.

If you do not use ``Kedro`` or if you do pure data processing which do not involve *machine learning*, this plugin is not what you are seeking for ;)
If you do not use ``Kedro`` or if you do pure data processing which does not involve *machine learning*, this plugin is not what you are seeking for ;-)

## Why should I use kedro-mlflow?

Expand Down
9 changes: 2 additions & 7 deletions docs/source/02_installation/01_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Requires: pip-tools, cachetools, fsspec, toposort, anyconfig, PyYAML, click, plu

## Install the plugin

The current version of the plugin is compatible with ``kedro>=0.16.0``. Since Kedro tries to enforce backward compatibility, it will very likely remain compatible with further versions.
There are version of the plugin compatible up to ``kedro>=0.16.0`` and ``mlflow>=0.8.0``. ``kedro-mlflow`` stops adding features to a minor version 2 to 6 months after a new kedro release.

### Install from PyPI

Expand Down Expand Up @@ -70,7 +70,7 @@ Type ``kedro info`` in a terminal to check the installation. If it has succeede
| |/ / _ \/ _` | '__/ _ \
| < __/ (_| | | | (_) |
|_|\_\___|\__,_|_| \___/
v0.16.<x>
v0.<minor>.<patch>

kedro allows teams to create analytics
projects. It is developed as part of
Expand All @@ -95,9 +95,4 @@ Usage: kedro mlflow [OPTIONS] COMMAND [ARGS]...

Options:
-h, --help Show this message and exit.

Commands:
new Create a new kedro project with updated template.
```

*Note: For now, the `kedro mlflow new` command is not implemented. You must use `kedro new` to create a project, and then call `kedro mlflow init` inside this new project.*
2 changes: 1 addition & 1 deletion docs/source/02_installation/02_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ In order to use the ``kedro-mlflow`` plugin, you need to setup its configuration
### Setting up the ``kedro-mlflow`` configuration file


``kedro-mlflow`` is [configured](../07_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the ``kedro-mlflow`` CLI](../07_python_objects/04_CLI.md), but you can create it manually.
``kedro-mlflow`` is [configured](../30_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the ``kedro-mlflow`` CLI](../30_python_objects/04_CLI.md), but you can create it manually.

```{note}
Since ``kedro-mlflow>=0.11.2``, the configuration file is optional. However, the plugin will use default ``mlflow`` configuration. Specifically, the runs will be stored in a ``mlruns`` folder at the root fo the kedro project since no ``mlflow_tracking_uri`` is configured.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/02_installation/03_migration_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ Be aware that if you have saved a pipeline as a mlflow model with `pipeline_ml_f

```json
{
predictions:
"predictions":
{
<your model-predictions>
"<your model-predictions>"
}
}
```
Expand All @@ -128,7 +128,7 @@ to:

```json
{
<your model-predictions>
"<your model-predictions>"
}
```

Expand Down
1 change: 1 addition & 0 deletions docs/source/02_installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Introduction
.. toctree::
:maxdepth: 4


Install the plugin <01_installation.md>
Setup your kedro project <02_setup.md>
Migration guide between versions <03_migration_guide.md>
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
Create a conda environment and install ``kedro-mlflow`` (this will automatically install ``kedro>=0.16.0``).

```console
conda create -n km_example python=3.9 --yes
conda create -n km_example python=3.10 --yes
conda activate km_example
pip install kedro-mlflow==0.13.4
pip install kedro-mlflow
```

## Install the toy project
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,14 @@

## Initialize kedro-mlflow

First, you need to initialize your project and add the plugin-specific configuration file with this command:
```{note}
This step is optional if you use ``kedro>=0.11.2``. If you do not create a ``mlflow.yml`` configuration file, ``kedro-mlflow`` will use the defaults. However this is heavily recommended because in professional setup you often need some specific enterprise configuration.
```

You can initialize your project with the plugin-specific configuration file with this command:

```console
kedro mlflow init
kedro mlflow init --env=local
```

You will see the following message:
Expand All @@ -18,6 +22,7 @@ The ``conf/local`` folder is updated and you can see the `mlflow.yml` file:

![initialized_project](../imgs/initialized_project.png)


*Optional: If you have configured your own mlflow server, you can specify the tracking uri in the ``mlflow.yml`` (replace the highlighted line below):*

![mlflow_yml](../imgs/mlflow_yml.png)
Expand Down Expand Up @@ -109,9 +114,6 @@ You should see the following graph:

which indicates clearly which parameters are logged (in the red boxes with the "parameter" icon).

### Journal information

The informations provided by the ``Kedro``'s ``Journal`` are also recorded as ``tags`` in the mlflow ui in order to make reproducible. In particular, the exact command used for running the pipeline and the kedro version used are stored.

### Artifacts

Expand Down Expand Up @@ -159,4 +161,4 @@ This works for any type of file (including images with ``MatplotlibWriter``) and
Above vanilla example is just the beginning of your experience with ``kedro-mlflow``. Check out the next sections to see how `kedro-mlflow`:

- offers advanced capabilities for machine learning versioning
- can help to create standardize pipelines for deployment in production
- offers a way to create custom mlflow model from your kedro pipelines to deploy effortlessly in production
File renamed without changes.
Loading

0 comments on commit d0fd002

Please sign in to comment.