-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter logging a nested dict fails if too long #441
Comments
Hi, sorry for the long delay for the reply. This limit is a mlflow limit and it is unlikely to change as they already stated they won't. I have 2 workarounds in mind that I list hereafter, please tell me if they fit your need. Log each key separatelyYou can use the tracking:
params:
dict_params:
flatten: True # -> change to True. parameter which are dictionary will be splitted in multiple parameters when logged in mlflow, one for each key.
recursive: True # Should the dictionary flattening be applied recursively (i.e for nested dictionaries)? Not use if `flatten_dict_params` is False.
sep: "." # In case of recursive flattening, what separator should be used between the keys? E.g. {hyperaparam1: {p1:1, p2:2}} will be logged as hyperaparam1.p1 and hyperaparam1.p2 in mlflow. This will create a lot of parameters entry but is likely the "right way" (from the mlflow point of view) to do things. Avoid logging with an extra file
It is True that by design kedro-mlflow forces you to log all parameters. If you don't want to log this dictionnary, I guess you think that it will almost never change? In this case, you can avoid logging by converting it to a yaml file and load it directly in the catalog: # data/01_raw/your_dict_parameter.yaml
key1:
key11: a
key12: b
key2:
key21:
key 211: ca
key 222: cb
key22: d your_dict:
type: yaml.YAMLDataSet
filepath: data/01_raw/your_dict_parameter.yaml Then replace The rest of your pipeline will be unmodified and the dict will not be logged. |
I close the issue since it is not a bug but a mlflow design. there are workarounds documented above, and i'll enable logging such dicts as artifacts in #442 |
Description
When I set the long_params_strategy to 'tag' in my mlflow.yml file, it fails if my parameter is a nested dict.
The error is in https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow/framework/hooks/mlflow_hook.py I think.
Context
I have some nested dict parameter file, and I dont even want to log it, but I am forced to. If I must, simply tagging is fine- but it fails as it is a nested dict.
Steps to Reproduce
1- set 'long_params_strategy' to 'tag' in my mlflow.yml file
2- use a large nested dict in a parameter yml
3- call the dict in a node
4- pipeline will crash as nested dict is greater than 5000 chars
Expected Result
log dict as simple tag in params in mlflow
Actual Result
crashes due to being nested dict
Your Environment
I do not believe env is an issue
Does the bug also happen with the last version on master?
Yes
The text was updated successfully, but these errors were encountered: