Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using MLflow Skinny instead of MLflow as the required dependency. #486

Open
rxm7706 opened this issue Nov 10, 2023 · 4 comments
Open

Using MLflow Skinny instead of MLflow as the required dependency. #486

rxm7706 opened this issue Nov 10, 2023 · 4 comments
Labels
need-design-decision Several ways of implementation are possible and one must be chosen

Comments

@rxm7706
Copy link
Contributor

rxm7706 commented Nov 10, 2023

If you like the repo, please give it a ⭐

Description

A clear and concise description of what you want to achieve. An image or a code example is worth thousand words!

With the introduction of ML-FLOW AI Gateway; ML-FLOW has become quite large in the number of dependencies.
To manage the growth of the ML-FLOW ecosystem, MLFLOW-Skinny was introduced.

MLflow Skinny is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. MLflow Skinny supports:

    Tracking operations (logging / loading / searching params, metrics, tags + logging / loading artifacts)
    Model registration, search, artifact loading, and transitions
    Execution of GitHub projects within notebook & against a remote target.

conda install mlflow-skinny vs conda install mlflow is over 100 packages additional.

Context

Why is this change important to you? How would you use it? How can it benefit other users?

Currently an Open CVE on Pyarrow https://nvd.nist.gov/vuln/detail/CVE-2023-47248 is flagged on kedro-mlflow because it depends on ML-Flow ; because one of the additional dependencies uses pyarrow

Additional dependencies can be installed to leverage the full feature set of MLflow. For example:

    To use the mlflow.sklearn component of MLflow Models, install scikit-learn, numpy and pandas.
    To use SQL-based metadata storage, install sqlalchemy, alembic, and sqlparse.
    To use serving-based features, install flask and pandas.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

If its possible for kedro-mlflow to use mlflow-skinny, it might be a good idea to change the dependency from mlflow to mlflow-skinny and let users manage their dependencies with more granularity.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

do nothing - leave things the way they are , but kedro-mlflow becomes bloated as mlflow full grows.

@Galileo-Galilei
Copy link
Owner

This is a duplicate of #344. It is quite old, so I will reassess it to see if we can make it work!

@rxm7706
Copy link
Contributor Author

rxm7706 commented Nov 11, 2023

Sorry I didn't see that issue earlier,
I see your response explaining the associated effort it would take to make this change.
Thank you & Regards.

@Galileo-Galilei Galileo-Galilei added the need-design-decision Several ways of implementation are possible and one must be chosen label Dec 19, 2023
Galileo-Galilei added a commit that referenced this issue Dec 19, 2023
@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Dec 20, 2023

Some good & bad news after testing:

  • kedro-mlflow works 99% correctly with mlflow-skinny. The only issues arise from MlflowModelRegistryDataset and this can be fixed by installing sqlalchemy and alembic. This is not even a real problem because I don't think people use the registry only locally without a remote mlflow server (with the whole mlflow install) so they very likely have what it needs install.
  • The biggest issue is that I think many people use kedro-mlflow locally and want to open the ui with kedro mlflow ui. You need to install flask and waitress (but this is straightforward), but you also need to have a couple of .js files, which are not bundled with mlflow skinny. This means this is not really possible to get these files just by installing an extra dependency (apart from downloading them directly from the mlflow source, but this is complicated to make it reliable across versions). I don't think people would accept dropping the local in in favor of faster builds.

@Galileo-Galilei
Copy link
Owner

Decision : I think the only way to make it work is to publish a kedro-mlflow-skinny separately on PyPI whose only difference with standard kedro-mlflow would be the replacement of the mlflow dependency by mlflow-skinny.

TODO :

  • Update the setup.py to handle packaging the package in two different ways. mlflow can be a [good source for inspiration https://github.com/mlflow/mlflow/blob/5d13d6ec620a02de9a5e31201bf1becdb9722ea5/setup.py#L126]
  • Update the github action "release workflow to release kedro-mlflow-skinny syncrhoneaously with kedro-mlflow

PR's are welcome!

@Galileo-Galilei Galileo-Galilei moved this from 📋 Backlog to ⛔ Blocked in kedro-mlflow roadmap Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-design-decision Several ways of implementation are possible and one must be chosen
Projects
Status: ⛔ Blocked
2 participants