📝 Documentation refactoring for readibility and up-to-dateness

Update doc up doc fix tests update doc huge doc refactoring doc refactoring doc refactoring finish doc update doc
Galileo-Galilei · Jan 28, 2025 · d0fd002 · d0fd002
1 parent 3555900
commit d0fd002
Show file tree

Hide file tree

Showing 53 changed files with 445 additions and 258 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -14,7 +14,7 @@
 
 ### Fixed
 
--   :bug: :ambulance: Ensure `MlflowArtifactDataset` logs in the same run that parameters to when using `mlflow>=2.18` in combination with `ThreadRunner` [#613](https://github.com/Galileo-Galilei/kedro-mlflow/issues/613))
+-   :bug: :ambulance: Ensure `MlflowArtifactDataset` logs in the same run that parameters to when using `mlflow>=2.18` in combination with `ThreadRunner` ([#613](https://github.com/Galileo-Galilei/kedro-mlflow/issues/613))
 
 ## [0.13.3] - 2024-10-29
 

diff --git a/README.md b/README.md
@@ -30,27 +30,24 @@
 
 **Important: ``kedro-mlflow`` is only compatible with ``kedro>=0.16.0`` and ``mlflow>=1.0.0``. If you have a project created with an older version of ``Kedro``, see this [migration guide](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md#migration-guide-from-kedro-015-to-016).**
 
-``kedro-mlflow`` is available on PyPI, so you can install it with ``pip``:
 
-```console
-pip install kedro-mlflow
-```
+You can install ``kedro-mlflow`` with several tools and packaging platforms:
 
-If you want to use the most up to date version of the package which is under development and not released yet, you can install the package from github:
+|                             **Logo**                              | **Platform** |**Command**|
+|:-----------------------------------------------------------------:|:------------:|:----------------------------------------------------:|
+|       ![PyPI logo](https://simpleicons.org/icons/pypi.svg)        |     PyPI     | ``pip install kedro-mlflow``                         |
+| ![Conda Forge logo](https://simpleicons.org/icons/condaforge.svg) | Conda Forge  | ``conda install kedro-mlflow --channel conda-forge`` |
+|     ![GitHub logo](https://simpleicons.org/icons/github.svg)      |    GitHub    | ``pip install --upgrade git+https://github.com/Galileo-Galilei/kedro-mlflow.git`` |
 
-```console
-pip install --upgrade git+https://github.com/Galileo-Galilei/kedro-mlflow.git
-```
-
-I strongly recommend to use ``conda`` (a package manager) to create an environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/latest/get_started/install.html).
+I strongly recommend to use ``conda`` (a package manager) to create a virtual environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/latest/get_started/install.html).
 
 # Getting started
 
 The documentation contains:
 
-- [A  "hello world" example](https://kedro-mlflow.readthedocs.io/en/latest/source/03_getting_started/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
-- A section for [advanced machine learning versioning](https://kedro-mlflow.readthedocs.io/en/latest/source/04_experimentation_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
-- A section to demonstrate how to use `kedro-mlflow` as a [machine learning framework](https://kedro-mlflow.readthedocs.io/en/latest/source/05_framework_ml/index.html) to deliver production ready pipelines and serve them. This section comes with [an example repo](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial) you can clone and try out.
+- [A  quickstart in 1 mn example](https://kedro-mlflow.readthedocs.io/en/latest/source/03_quickstart/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
+- A section for [advanced machine learning versioning](https://kedro-mlflow.readthedocs.io/en/latest/source/10_experiment_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
+- A section to demonstrate how to use `kedro-mlflow` as a [machine learning framework](https://kedro-mlflow.readthedocs.io/en/latest/source/21_pipeline_serving/index.html) to deliver production ready pipelines and serve them. This section comes with [an example repo](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial) you can clone and try out.
 
 Some frequently asked questions on more advanced features:
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -7,21 +7,33 @@ Welcome to kedro-mlflow's documentation!
 ========================================
 
 .. toctree::
-   :maxdepth: 6
+   :maxdepth: -1
+   :caption: Getting started
 
    Introduction <source/01_introduction/index.rst>
    Installation <source/02_installation/index.rst>
-   Getting Started <source/03_getting_started/index.rst>
-   Experimentation tracking <source/04_experimentation_tracking/index.rst>
-   Pipeline serving <source/05_pipeline_serving/index.rst>
-   A mlops framework for continuous model serving <source/05_framework_ml/index.rst>
-   Interactive use <source/06_interactive_use/index.rst>
-   Python objects <source/07_python_objects/index.rst>
+   Quickstart in 1 mn <source/03_quickstart/index.rst>
 
 .. toctree::
-   :maxdepth: 6
+   :maxdepth: -1
+   :caption: Experiment tracking
 
-   API documentation <source/08_API/kedro_mlflow.rst>
+   In a kedro project <source/10_experiment_tracking/index.rst>
+   In a notebook <source/11_interactive_use/index.rst>
+
+.. toctree::
+   :maxdepth: -1
+   :caption: Pipeline serving
+
+   Custom mlflow model for kedro pipelines <source/21_pipeline_serving/index.rst>
+   A mlops framework for continuous model serving <source/22_framework_ml/index.rst>
+
+.. toctree::
+   :maxdepth: -1
+   :caption: Technical documentation
+
+   Python objects <source/30_python_objects/index.rst>
+   API documentation <source/31_API/kedro_mlflow.rst>
 
 Indices and tables
 ==================

diff --git a/docs/source/01_introduction/01_introduction.md b/docs/source/01_introduction/01_introduction.md
@@ -27,7 +27,7 @@ While ``Kedro`` and ``Mlflow`` do not compete in the same field, they provide so
 | I/O configuration files        | - ``catalog.yml`` <br> - ``parameters.yml``       | ``MLproject``                                                                       |
 | Compute abstraction            | - ``Pipeline`` <br> - ``Node``                    | N/A                                                                                 |
 | Compute configuration files    | - ``hooks.py`` <br> - ``run.py``                  | `MLproject`                                                                         |
-| Parameters and data versioning | - ``Journal`` <br> - ``AbstractVersionedDataset`` | - ``log_metric``<br> - ``log_artifact``<br> - ``log_param``                         |
+| Parameters and data versioning | - ``Journal`` (deprecated) <br> - Experiment tracking (deprecated) <br> - ``AbstractVersionedDataset`` | - ``log_metric``<br> - ``log_artifact``<br> - ``log_param``|
 | Cli execution                  | command ``kedro run``                             | command ``mlflow run``                                                              |
 | Code packaging                 | command ``kedro package``                         | N/A                                                                                 |
 | Model packaging                | N/A                                               | - ``Mlflow Models`` (``mlflow.XXX.log_model`` functions) <br> - ``Mlflow Flavours`` |
@@ -39,23 +39,17 @@ We discuss hereafter how the two libraries compete on the different functionalit
 
 ``Mlflow`` and ``Kedro`` are essentially overlapping on the way they offer a dedicated configuration files for running the pipeline from CLI. However:  
 
-- ``Mlflow`` provides a single configuration file (the ``MLProject``) where all elements are declared (data, parameters and pipelines). Its goal is mainly to enable CLI execution of the project, but it is not very flexible. In my opinion, this file is **production oriented** and is not really intended to use for exploration.
+- ``Mlflow`` provides a single configuration file (the ``MLProject``) where all elements are declared (data, parameters and pipelines). Its goal is mainly to enable CLI execution of the project, but it is not very flexible. This file is **production oriented** and is not really intended to use for  and development.
 - ``Kedro`` offers a bunch of files (``catalog.yml``, ``parameters.yml``, ``pipeline.py``) and their associated abstraction (``AbstractDataset``, ``DataCatalog``, ``Pipeline`` and ``node`` objects). ``Kedro`` is much more opinionated: each object has a dedicated place (and only one!) in the template. This makes the framework both **exploration and production oriented**. The downside is that it could make the learning curve a bit sharper since a newcomer has to learn all ``Kedro`` specifications. It also provides a ``kedro-viz`` plugin to visualize the DAG interactively, which is particularly handy in medium-to-big projects.
 
 
-> **``Kedro`` is a clear winner here, since it provides more functionnalities than ``Mlflow``. It handles very well _by design_ the exploration phase of data science projects when Mlflow is less flexible.**
+```{note}
+**``Kedro`` is a clear winner here, since it provides more functionnalities than ``Mlflow``. It handles very well _by design_ the exploration phase of data science projects when Mlflow is less flexible.**
+```
 
 ### Versioning: Kedro 1 - 1 Mlflow
 
-** This section will be updated soon with the brand new experiment tracking functionality of kedro**
-
-The ``Kedro`` ``Journal`` aimed at reproducibility (it was removed in ``kedro==0.18``), but is not focused on machine learning. The `Journal` keeps track of two elements:
-
-- the CLI arguments, including *on the fly* parameters. This makes the command used to run the pipeline fully reproducible.
-- the ``AbstractVersionedDataset`` for which versioning is activated. It consists in copying the data whom ``versioned`` argument is ``True`` when the ``save`` method of the ``AbstractVersionedDataset`` is called.
-This approach suffers from two main drawbacks:
-  - the configuration is assumed immutable (including parameters), which is not realistic ni machine learning projects where they are very volatile. To fix this, the ``git sha`` has been recently added to the ``Journal``, but it has still some bugs in my experience (including the fact that the current ``git sha`` is logged even if the pipeline is ran with uncommitted change, which prevents reproducibility). This is still recent and will likely evolve in the future.
-  - there is no support for browsing old runs, which prevents [cleaning the database with old and unused datasets](https://github.com/quantumblacklabs/kedro/issues/406), compare runs between each other...
+Kedro ahas made a bunch of attempts in the world of experiment tracking, with the ``Journal`` in early days (``kedro<=0.18``), then with an [experiment tracking functionality](https://docs.kedro.org/projects/kedro-viz/en/v9.2.0/experiment_tracking.html) which kept track of the parameters but which will be removed in ``kedro>=0.20`` due to the lack of traction (https://github.com/kedro-org/kedro-viz/issues/2202).
 
 On the other hand, ``Mlflow``:
 
@@ -64,7 +58,9 @@ On the other hand, ``Mlflow``:
 - [comes with a *User Interface* (UI)](https://mlflow.org/docs/latest/tracking.html#id7) which enable to browse / filter / sort the runs, display graphs of the metrics, render plots... This make the run management much easier than in ``Kedro``.
 - has a command to reproduce exactly the run from a given ``git sha``, [which is not possible in ``Kedro``](https://github.com/quantumblacklabs/kedro/issues/297).
 
-> **``Mlflow`` is a clear winner here, because _UI_ and _run querying_ are must-have for machine learning projects. It is more mature than ``Kedro`` for versioning and more focused on machine learning.**
+```{note}
+**``Mlflow`` is a clear winner here, because _UI_ and _run querying_ are must-have for machine learning projects. It is more mature than ``Kedro`` for versioning and more focused on machine learning.**
+```
 
 ### Model packaging and service: Kedro 1 - 2 Mlflow
 
@@ -79,8 +75,10 @@ On the other hand, ``Mlflow``:
 
 When a stored model meets these requirements, ``Mlflow`` provides built-in tools to serve the model (as an API or for batch prediction) on many machine learning tools (Microsoft Azure ML, Amazon Sagemaker, Apache SparkUDF) and locally.
 
-> **``Mlflow`` is currently the only tool which adresses model serving. This is currently not the top priority for ``Kedro``, but may come in the future ([through Kedro Server maybe?](https://github.com/quantumblacklabs/kedro/issues/143))**
+```{note}
+``Mlflow`` is currently the only tool which adresses model serving. Some [plugins address model deployment and serving](https://docs.kedro.org/en/stable/extend_kedro/plugins.html#community-developed-plugins) in the Kedro ecosystem, but they are not as well maintained as the core framework.
+```
 
 ### Conclusion: Use Kedro and add Mlflow for machine learning projects
 
-In my opinion, ``Kedro``'s will to enforce software engineering best practice makes it really useful for machine learning teams. It is extremely well documented and the support is excellent, which makes it very user friendly even for people with no computer science background. However, it lacks some machine learning-specific functionalities (better versioning, model service), and it is where ``Mlflow`` fills the gap.
+``Kedro``'s will to enforce software engineering best practice makes it really useful for machine learning teams. It is extremely well documented and the support is excellent, which makes it very user friendly even for people with no computer science background. However, it lacks some machine learning-specific functionalities (better versioning, model service), and it is where ``Mlflow`` fills the gap.
diff --git a/docs/source/01_introduction/02_motivation.md b/docs/source/01_introduction/02_motivation.md
@@ -4,7 +4,7 @@
 
 Basically, you should use `kedro-mlflow` in **any `Kedro` project which involves machine learning** / deep learning. As stated in the [introduction](./01_introduction.md), `Kedro`'s current versioning (as of version `0.16.6`) is not sufficient for machine learning projects: it lacks a UI and a ``run`` management system. Besides, the `KedroPipelineModel` ability to serve a kedro pipeline as an API or a batch in one line of code is a great addition for collaboration and transition to production.
 
-If you do not use ``Kedro`` or if you do pure data processing which do not involve *machine learning*, this plugin is not what you are seeking for ;)
+If you do not use ``Kedro`` or if you do pure data processing which does not involve *machine learning*, this plugin is not what you are seeking for ;-)
 
 ## Why should I use kedro-mlflow?
 

diff --git a/docs/source/02_installation/01_installation.md b/docs/source/02_installation/01_installation.md
@@ -42,7 +42,7 @@ Requires: pip-tools, cachetools, fsspec, toposort, anyconfig, PyYAML, click, plu
 
 ## Install the plugin
 
-The current version of the plugin is compatible with ``kedro>=0.16.0``. Since Kedro tries to enforce backward compatibility, it will very likely remain compatible with further versions.
+There are version of the plugin compatible up to ``kedro>=0.16.0`` and ``mlflow>=0.8.0``. ``kedro-mlflow`` stops adding features to a minor version 2 to 6 months after a new kedro release.
 
 ### Install from PyPI
 
@@ -70,7 +70,7 @@ Type  ``kedro info`` in a terminal to check the installation. If it has succeede
 | |/ / _ \/ _` | '__/ _ \
 |   <  __/ (_| | | | (_) |
 |_|\_\___|\__,_|_|  \___/
-v0.16.<x>
+v0.<minor>.<patch>
 
 kedro allows teams to create analytics
 projects. It is developed as part of
@@ -95,9 +95,4 @@ Usage: kedro mlflow [OPTIONS] COMMAND [ARGS]...
 
 Options:
   -h, --help  Show this message and exit.
-
-Commands:
-  new  Create a new kedro project with updated template.
 ```
-
-*Note: For now, the `kedro mlflow new` command is not implemented. You must use `kedro new` to create a project, and then call `kedro mlflow init` inside this new project.*
diff --git a/docs/source/02_installation/02_setup.md b/docs/source/02_installation/02_setup.md
@@ -15,7 +15,7 @@ In order to use the ``kedro-mlflow`` plugin, you need to setup its configuration
 ### Setting up the ``kedro-mlflow`` configuration file
 
 
-``kedro-mlflow`` is [configured](../07_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the ``kedro-mlflow`` CLI](../07_python_objects/04_CLI.md), but you can create it manually.
+``kedro-mlflow`` is [configured](../30_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the ``kedro-mlflow`` CLI](../30_python_objects/04_CLI.md), but you can create it manually.
 
 ```{note}
 Since ``kedro-mlflow>=0.11.2``, the configuration file is optional. However, the plugin will use default ``mlflow`` configuration. Specifically, the runs will be stored in a ``mlruns`` folder at the root fo the kedro project since no ``mlflow_tracking_uri`` is configured.

diff --git a/docs/source/02_installation/03_migration_guide.md b/docs/source/02_installation/03_migration_guide.md
@@ -117,9 +117,9 @@ Be aware that if you have saved a pipeline as a mlflow model with `pipeline_ml_f
 
 ```json
 {
-    predictions:
+    "predictions":
         {
-            <your model-predictions>
+            "<your model-predictions>"
         }
 }
 ```
@@ -128,7 +128,7 @@ to:
 
 ```json
 {
-    <your model-predictions>
+    "<your model-predictions>"
 }
 ```
 

diff --git a/docs/source/02_installation/index.rst b/docs/source/02_installation/index.rst
@@ -4,6 +4,7 @@ Introduction
 .. toctree::
    :maxdepth: 4
 
+
    Install the plugin <01_installation.md>
    Setup your kedro project <02_setup.md>
    Migration guide between versions <03_migration_guide.md>
diff --git a/...e/03_getting_started/00_intro_tutorial.md → ...source/03_quickstart/00_intro_tutorial.md b/...e/03_getting_started/00_intro_tutorial.md → ...source/03_quickstart/00_intro_tutorial.md
diff --git a/.../03_getting_started/01_example_project.md → ...ource/03_quickstart/01_example_project.md b/.../03_getting_started/01_example_project.md → ...ource/03_quickstart/01_example_project.md
@@ -5,9 +5,9 @@
 Create a conda environment and install ``kedro-mlflow`` (this will automatically install ``kedro>=0.16.0``).
 
 ```console
-conda create -n km_example python=3.9 --yes
+conda create -n km_example python=3.10 --yes
 conda activate km_example
-pip install kedro-mlflow==0.13.4
+pip install kedro-mlflow
 ```
 
 ## Install the toy project

diff --git a/...urce/03_getting_started/02_first_steps.md → docs/source/03_quickstart/02_first_steps.md b/...urce/03_getting_started/02_first_steps.md → docs/source/03_quickstart/02_first_steps.md
@@ -2,10 +2,14 @@
 
 ## Initialize kedro-mlflow
 
-First, you need to initialize your project and add the plugin-specific configuration file with this command:
+```{note}
+This step is optional if you use ``kedro>=0.11.2``. If you do not create a ``mlflow.yml`` configuration file, ``kedro-mlflow`` will use the defaults. However this is heavily recommended because in professional setup you often need some specific enterprise configuration.
+```
+
+You can initialize your project with the plugin-specific configuration file with this command:
 
 ```console
-kedro mlflow init
+kedro mlflow init --env=local
 ```
 
 You will see the following message:
@@ -18,6 +22,7 @@ The ``conf/local`` folder is updated and you can see the `mlflow.yml` file:
 
 ![initialized_project](../imgs/initialized_project.png)
 
+
 *Optional: If you have configured your own mlflow server, you can specify the tracking uri in the ``mlflow.yml`` (replace the highlighted line below):*
 
 ![mlflow_yml](../imgs/mlflow_yml.png)
@@ -109,9 +114,6 @@ You should see the following graph:
 
 which indicates clearly which parameters are logged (in the red boxes with the "parameter" icon).
 
-### Journal information
-
-The informations provided by the ``Kedro``'s ``Journal`` are also recorded as ``tags`` in the mlflow ui in order to make reproducible. In particular, the exact command used for running the pipeline and the kedro version used are stored.
 
 ### Artifacts
 
@@ -159,4 +161,4 @@ This works for any type of file (including images with ``MatplotlibWriter``) and
 Above vanilla example is just the beginning of your experience with ``kedro-mlflow``. Check out the next sections to see how `kedro-mlflow`:
 
 - offers advanced capabilities for machine learning versioning
-- can help to create standardize pipelines for deployment in production
+- offers a way to create custom mlflow model from your kedro pipelines to deploy effortlessly in production
diff --git a/docs/source/03_getting_started/index.rst → docs/source/03_quickstart/index.rst b/docs/source/03_getting_started/index.rst → docs/source/03_quickstart/index.rst