Skip to content

Commit

Permalink
FIX #16 - Document how to use kedro-mlflow as a ml framework with pip…
Browse files Browse the repository at this point in the history
…eline_ml_factory
  • Loading branch information
Galileo-Galilei committed Feb 21, 2021
1 parent 7864f49 commit 9723836
Show file tree
Hide file tree
Showing 28 changed files with 634 additions and 17 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- A new `long_parameters_strategy` key is added in the `mlflow.yml` (under in the hook/node section). You can specify different strategies (`fail`, `truncate` or `tag`) to handle parameters over 250 characters which cause crashes for some mlflow backend. ([#69](https://github.com/Galileo-Galilei/kedro-mlflow/issues/69))
- Add an `env` parameter to `kedro mlflow init` command to specify under which `conf/` subfolder the `mlflow.yml` should be created. ([#159](https://github.com/Galileo-Galilei/kedro-mlflow/issues/159))
- The input parameters of the `inference` pipeline of a `PipelineML` object are now automatically pickle-ised and converted as artifacts. ([#158](https://github.com/Galileo-Galilei/kedro-mlflow/issues/158))
- [Detailed documentation on how to use `pipeline_ml_factory`](https://kedro-mlflow.readthedocs.io/en/latest/source/05_framework_ml/index.html) function, and more generally on how to use ``kedro-mlflow`` as mlops framework. This comes from [an example repo ``kedro-mlflow-tutorial``](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial). ([#16](https://github.com/Galileo-Galilei/kedro-mlflow/issues/16))

### Fixed

Expand All @@ -15,7 +16,7 @@

### Changed

- The `KedroPipelineModel.load_context()` method now loads all the `DataSets` in memory in the `DataCatalog`. It is also now possible to specify the `runner` to execute the model as well as the `copy_mode` when executing the inference pipeline (instead of deepcopying the datasets between each nodes which is kedro's default). This makes the API serving with `mlflow serve` command considerably faster (~20 times faster) for models which needs compiling (i.e. keras, tensorflow ...) ([#133](https://github.com/Galileo-Galilei/kedro-mlflow/issues/133))
- The `KedroPipelineModel.load_context()` method now loads all the `DataSets` in memory in the `DataCatalog`. It is also now possible to specify the `runner` to execute the model as well as the `copy_mode` when executing the inference pipeline (instead of deepcopying the datasets between each nodes which is kedro's default). This makes the API serving with `mlflow serve` command considerably faster (~20 times faster) for models which need compiling (e.g. keras, tensorflow ...) ([#133](https://github.com/Galileo-Galilei/kedro-mlflow/issues/133))
- The CLI projects commands are now always accessible even if you have not called `kedro mlflow init` yet to create a `mlflow.yml` configuration file ([#159](https://github.com/Galileo-Galilei/kedro-mlflow/issues/159))

## [0.4.1] - 2020-12-03
Expand Down
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

# How do I install kedro-mlflow?

**Important: kedro-mlflow is only compatible with ``kedro>=0.16.0``. If you have a project created with an older version of ``Kedro``, see this [migration guide](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md#migration-guide-from-kedro-015-to-016).**
**Important: ``kedro-mlflow`` is only compatible with ``kedro>=0.16.0``. If you have a project created with an older version of ``Kedro``, see this [migration guide](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md#migration-guide-from-kedro-015-to-016).**

``kedro-mlflow`` is available on PyPI, so you can install it with ``pip``:

Expand All @@ -45,8 +45,9 @@ I strongly recommend to use ``conda`` (a package manager) to create an environme

The documentation contains:

- [A "hello world" example](https://kedro-mlflow.readthedocs.io/en/stable/source/03_getting_started/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
- A more [detailed tutorial](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
- [A "hello world" example](https://kedro-mlflow.readthedocs.io/en/latest/source/03_getting_started/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
- A section for [advanced machine learning versioning](https://kedro-mlflow.readthedocs.io/en/latest/source/04_experimentation_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
- A section to demonstrate how to use `kedro-mlflow` as a [machine learning framework](https://kedro-mlflow.readthedocs.io/en/latest/source/05_framework_ml/index.html) to deliver production ready pipelines and serve them. This section comes with [an example repo](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial) you can clone and try out.

Some frequently asked questions on more advanced features:

Expand All @@ -69,9 +70,9 @@ If you want to see how to migrate from one version of `kedro-mlflow` to another,

# Can I contribute?

We'd be happy to receive help to maintain and improve the package. Please check the [contributing guidelines](https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/CONTRIBUTING.md).
We'd be happy to receive help to maintain and improve the package. Any PR will be considered (from typo in the docs to core features add-on) Please check the [contributing guidelines](https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/CONTRIBUTING.md).

#### Main contributors
# Main contributors

The following people actively maintain, enhance and discuss design to make this package as good as possible. Many thanks to them!

Expand Down
7 changes: 3 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. kedro-mlflow documentation master file, created by
.. ``kedro-mlflow`` documentation master file, created by
sphinx-quickstart on Mon Jul 13 14:21:13 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Expand All @@ -12,9 +12,8 @@ Welcome to kedro-mlflow's documentation!
Introduction <source/01_introduction/index.rst>
Installation <source/02_installation/index.rst>
Getting Started <source/03_getting_started/index.rst>
Advanced versioning of machine learning experimentations <source/04_experimentation_tracking/index.rst>
A comprehensive framework to deliver machine learning pipelines <source/05_framework_ml/index.rst>
Advanced capabilities <source/06_advanced_use/index.rst>
Advanced machine learning versioning <source/04_experimentation_tracking/index.rst>
A mlops framework for efficient deployment <source/05_framework_ml/index.rst>
Python objects <source/07_python_objects/index.rst>


Expand Down
2 changes: 1 addition & 1 deletion docs/source/01_introduction/02_motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Above implementations have the advantage of being very straightforward and *mlfl
- it is **hard to modify** (if you want to remove / add / modify an mlflow action you have to find it in the code)
- it **prevents reuse** (re-usable function must not contain mlflow specific code unrelated to their functional specificities, only their execution must be tracked).

``kedro-mlflow`` enforces these best practices while implementing a clear interface for each mlflow action in Kedro template. Below chart maps the mlflow action to perform with the Python API provided by kedro-mlflow and the location in Kedro template where the action should be performed.
``kedro-mlflow`` enforces these best practices while implementing a clear interface for each mlflow action in Kedro template. Below chart maps the mlflow action to perform with the Python API provided by ``kedro-mlflow`` and the location in Kedro template where the action should be performed.

|Mlflow action |Template file |Python API |
|:----------------------------|:-----------------------|:------------------------------------------------------|
Expand Down
6 changes: 3 additions & 3 deletions docs/source/02_installation/02_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ If you do not have a real-world project, you can use a kedro example and [follow

In order to use the ``kedro-mlflow`` plugin, you need to setup its configuration and declare its hooks. Those 2 actions are detailled in the following paragraphs.

### Setting up the kedro-mlflow configuration file
### Setting up the ``kedro-mlflow`` configuration file

``kedro-mlflow`` is [configured](../07_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the kedro-mlflow CLI](../07_python_objects/04_CLI.md). **It is mandatory for the plugin to work.**
``kedro-mlflow`` is [configured](../07_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the ``kedro-mlflow`` CLI](../07_python_objects/04_CLI.md). **It is mandatory for the plugin to work.**

Set the working directory at the root of your kedro project (i.e. the folder with the ``.kedro.yml`` file)

Expand All @@ -40,7 +40,7 @@ you should see the following message:
kedro mlflow init --env=<other-environment>
```

### Declaring kedro-mlflow hooks
### Declaring ``kedro-mlflow`` hooks

``kedro_mlflow`` hooks implementations must be registered with Kedro. There are three ways of registering [hooks](https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html).

Expand Down
2 changes: 1 addition & 1 deletion docs/source/03_getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Introduction

Goal of the tutorial <00_intro_tutorial.md>
Create an example project <01_example_project.md>
First steps with kedro-mlflow <02_first_steps.md>
First steps with ``kedro-mlflow`` <02_first_steps.md>
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ hooks:
flatten_dict_params: False # if True, parameter which are dictionary will be splitted in multiple parameters when logged in mlflow, one for each key.
recursive: True # Should the dictionary flattening be applied recursively (i.e for nested dictionaries)? Not use if `flatten_dict_params` is False.
sep: "." # In case of recursive flattening, what separator should be used between the keys? E.g. {hyperaparam1: {p1:1, p2:2}}will be logged as hyperaparam1.p1 and hyperaparam1.p2 oin mlflow.
long_parameters_strategy: fail # One of ["fail", "tag", "truncate" ] If a parameter is above mlflow limit (currently 250), what should kedro-mlflow do? -> fail, set as a tag instead of a parameter, or truncate it to its 250 first letters?
long_parameters_strategy: fail # One of ["fail", "tag", "truncate" ] If a parameter is above mlflow limit (currently 250), what should ``kedro-mlflow`` do? -> fail, set as a tag instead of a parameter, or truncate it to its 250 first letters?
```
If you set `flatten_dict_params` to `True`, each key of the dictionary will be logged as a mlflow parameters, instead of a single parameter for the whole dictionary. Note that it is recommended to facilitate run comparison.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_experimentation_tracking/06_mlflow_ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Mlflow offers a user interface (UI) that enable to browse the run history.

## The kedro-mlflow helper
## The ``kedro-mlflow`` helper

When you use a local storage for kedro mlflow, you can call a [mlflow cli command](https://www.mlflow.org/docs/latest/quickstart.html#viewing-the-tracking-ui) to launch the UI if you do not have a [mlflow tracking server configured](https://www.mlflow.org/docs/latest/tracking.html#tracking-ui).

Expand Down
Loading

0 comments on commit 9723836

Please sign in to comment.