Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage #498

hugocool · 2023-12-13T12:28:38Z

Description

Seeking clarification and examples for the use of MlflowModelRegistryDataSet within the Kedro-Mlflow integration for logging and managing models in MLflow’s model registry. Specifically, I need clarification on how to save a model to a specific version or state directly (e.g., staging) and how to retrieve a specific version of a model, like 'staging version 6'. The documentation provides parameters but lacks practical examples, especially for scenarios like logging a model directly to a specific stage like 'staging'.

Context

This change is crucial for efficiently managing model versions and stages using the Kedro-Mlflow integration. The ability to directly save and retrieve specific model versions and stages would streamline the workflow and enhance the overall usability of the integration. This functionality would not only benefit my current projects but also provide a clearer path for other users working with model versioning and staging in MLflow.
My journey began with successfully implementing MlflowModelLoggerDataSet as per the documentation. However, confusion arose with the MlflowModelRegistryDataSet. My initial setup was:

my_transformer_model:
  type: kedro_mlflow.io.models.MlflowModelRegistryDataSet
  flavor: mlflow.transformers
  model_name: my_transformer_model_name
  stage_or_version: staging

This configuration led to a DatasetError when trying to save a model, indicating the absence of a 'save' method for MlflowModelRegistryDataSet. The documentation, while detailing parameters, falls short in providing practical examples for saving and registering models.

Workaround

A solution I found for logging models involved using MlflowModelLoggerDataSet, but this does not directly address the issue of staging/versioning through the API or retrieving specific versions:

my_transformer_model:
    type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
    flavor: mlflow.transformers
    save_args:
        registered_model_name: "my_transformer_model_name"

This method effectively facilitated saving and loading the model in MLflow, but it's not documented as such. However, this method lacks direct control over versioning/staging and does not offer a clear path for retrieving specific versions. Only the MlflowModelRegistryDataSet allow one to load such named models.

Specific Concerns and Clarifications Needed

Loading Specific Versions: While kedro_mlflow.io.models.MlflowModelRegistryDataSet is necessary for loading specific versions of a model, the process for saving a model to a specific version or state (e.g., logging a model directly to staging) is unclear.
Retrieving Specific Model Versions: The methodology for retrieving a specific version of a model, such as 'staging version 6', is not clearly documented.
Direct Versioning/Staging Through API: Guidance is needed on how to stage or version a model directly through the API, as opposed to using the MLflow UI.
Viewing Associated Metrics: Instructions on how to view associated metrics with the model training run in the MLflow model UI are needed to effectively promote the best model to staging.

Possible Implementation

Update the documentation to include explicit examples of using MlflowModelRegistryDataSet for saving models directly to a specific version or stage (e.g., staging).
Provide examples for retrieving specific model versions, such as how to fetch 'staging version 6' of a model.

An example implementation might look something like this in the catalog.yml:

my_model:
  type: kedro_mlflow.io.models.MlflowModelRegistryDataSet
  flavor: mlflow.sklearn
  model_name: my_model_name
  stage_or_version: "staging:6"  # How to specify direct logging to this stage?

The text was updated successfully, but these errors were encountered:

Galileo-Galilei added documentation Improvements or additions to documentation need-design-decision Several ways of implementation are possible and one must be chosen labels Dec 19, 2023

Galileo-Galilei moved this to 🆕 New in kedro-mlflow roadmap Oct 29, 2024

Galileo-Galilei added this to kedro-mlflow roadmap Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage #498

Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage #498

hugocool commented Dec 13, 2023

Detailed Documentation and Examples for MlflowModelRegistryDataSet Usage #498

Detailed Documentation and Examples for MlflowModelRegistryDataSet Usage #498

Comments

hugocool commented Dec 13, 2023

Description

Context

Workaround

Specific Concerns and Clarifications Needed

Possible Implementation

Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage #498

Detailed Documentation and Examples for `MlflowModelRegistryDataSet` Usage #498