-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow passing a tracking ID for API requests with side-effects #2
Conversation
Signed-off-by: Phoevos Kalemkeris <[email protected]>
Signed-off-by: Phoevos Kalemkeris <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good. Given the user could now send in any arbitrary strings, can this also verify if track_id
is a valid UUID and fail earlier if not?
@baixiac I'm thinking about how we should handle this. I see that the UUIDs MLflow creates (e.g. see here) are converted to their hex format when used as run IDs. But there's no reason not to support the standard format used in the Gateway as well, e.g. |
Validate the tracking ID in the API endpoints that require it, ensuring it's an alphanumeric string of length 1-256. The implementation and tests are based on MLflow's internal run ID validation: https://github.com/mlflow/mlflow/blob/92a1664ddbd7ef59f8db45e988e41437d179c3b1/mlflow/utils/validation.py#L374-L377 Signed-off-by: Phoevos Kalemkeris <[email protected]>
Hmm... I have no strong opinion on this. As long as the IDs are consistent across CGW and CMS, that should be fine, whether their format is strict or not. Feel free to make it cover all mlflow ID formats and at the end of the day those are from a 3rd party tool and may change without notice. |
Great, I did it the MLflow way, allowing alphanumeric strings of length 1-256, PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some of the requests supported by the API have "persisted" side-effects like:
Given that other systems might need to track (e.g. monitor training progress on MLflow) or store these results (e.g. export responses to an object store), it's worth extending the relevant routes to accept an optional
tracking_id
query parameter, shifting the responsibility for generating and keeping track of IDs to the caller. If an ID is not provided, a UUID is generated and used in downstream tasks like before.For 5 of the routes affected by this PR (i.e.
/train_supervised
,/train_unsupervised
,/train_unsupervised_with_hf_hub_dataset
,/train_metacat
, and/evaluate
), the tracking ID ends up being the name of the triggered MLflow run.In the remaining cases, the tracking ID is used as part of the filename in the response's
Content-Disposition
header.The serving tests are extended to check that the ID (if provided) is included in the response.