MLFlow conflict when using Databricks #610

diegoliraQB · 2024-11-21T15:04:53Z

Description

When running kedro-mlflow on Databricks, occasionally a new run of the experiment might be triggered when running parallelized code. This is because Databricks enables autologging (at least in recent runtimes), and the new runs might be due to an mlflow bug.

Proposed solution: Add a new hook to disable autolog, or include it in the current hook.

class DisableMLFlowAutoLogger:    
    @hook_impl(tryfirst=True)
    def after_context_created(self, context) -> None:    
        mlflow.autolog(disable=True)

Although I encountered this because of Databricks, I can't imagine a context where you'd like to enable autolog together with the plugin. Could be a parameter of mlflow.yml if you want to be flexible.

Context

See conversation for context:
https://kedro-org.slack.com/archives/C03RKP2LW64/p1732141412790889

Steps to Reproduce

Start a a Kedro pipeline using kedro-mlflow in a Databricks interactive notebook
Use some parallelized code to trigger a new run. Minimal example with Optuna:

    study = optuna.create_study()
    study.optimize(lambda trial: objective(my_data,trial),n_trials=100,n_jobs=-1)

This will trigger maybe 4-6 new runs when using LightGBM in your objective.

Expected Result

Results should be in the run started by kedro-mlflow.

Actual Result

New runs are triggered.

Your Environment

Databricks Runtime 15.4 ML
Kedro 19.9
kedro-mlflow 0.13.3

Does the bug also happen with the last version on master?

Yes

The text was updated successfully, but these errors were encountered:

Galileo-Galilei · 2024-12-14T20:10:55Z

Hi,

Can you check the mlflow version you used? If it is above 2.18.0, there are some recent modifications in mlflow to make it thread safe to avoid race condition when running in parallel, which affects kedro-mlflow (#613, #615), pycaret (pycaret/pycaret#4100) and optuna used to have a workaround (optuna/optuna#4088) which may or not be concerned (didn't lok at their code).

It should be fixed in kedro-mlflow, can you install kedro-mlflow>0.13.4 and confirm you are still experiencing the bug before I make the change?

diegoliraQB · 2024-12-16T16:42:47Z

I can confirm that the problem still happens, using mlflow 2.19.0 and kedro-mlflow 0.13.4
Disabling autologging solves it.

Galileo-Galilei · 2024-12-16T20:18:10Z

Thanks for confirming, I'll investigate and make sure to fix this soon!

github-project-automation bot added this to kedro-mlflow roadmap Nov 21, 2024

github-project-automation bot moved this to 🆕 New in kedro-mlflow roadmap Nov 21, 2024

Galileo-Galilei added the bug Something isn't working label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLFlow conflict when using Databricks #610

MLFlow conflict when using Databricks #610

diegoliraQB commented Nov 21, 2024

Galileo-Galilei commented Dec 14, 2024 •

edited

Loading

diegoliraQB commented Dec 16, 2024 •

edited

Loading

Galileo-Galilei commented Dec 16, 2024 •

edited

Loading

MLFlow conflict when using Databricks #610

MLFlow conflict when using Databricks #610

Comments

diegoliraQB commented Nov 21, 2024

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

Does the bug also happen with the last version on master?

Galileo-Galilei commented Dec 14, 2024 • edited Loading

diegoliraQB commented Dec 16, 2024 • edited Loading

Galileo-Galilei commented Dec 16, 2024 • edited Loading

Galileo-Galilei commented Dec 14, 2024 •

edited

Loading

diegoliraQB commented Dec 16, 2024 •

edited

Loading

Galileo-Galilei commented Dec 16, 2024 •

edited

Loading