Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow passing a tracking ID for API requests with side-effects #2

Merged
merged 4 commits into from
Dec 18, 2024

Conversation

phoevos
Copy link
Member

@phoevos phoevos commented Dec 17, 2024

Some of the requests supported by the API have "persisted" side-effects like:

  • logging a run on MLflow
  • returning a file as the HTTP response

Given that other systems might need to track (e.g. monitor training progress on MLflow) or store these results (e.g. export responses to an object store), it's worth extending the relevant routes to accept an optional tracking_id query parameter, shifting the responsibility for generating and keeping track of IDs to the caller. If an ID is not provided, a UUID is generated and used in downstream tasks like before.

For 5 of the routes affected by this PR (i.e. /train_supervised, /train_unsupervised, /train_unsupervised_with_hf_hub_dataset, /train_metacat, and /evaluate), the tracking ID ends up being the name of the triggered MLflow run.

In the remaining cases, the tracking ID is used as part of the filename in the response's Content-Disposition header.

The serving tests are extended to check that the ID (if provided) is included in the response.

@phoevos phoevos added the enhancement New feature or request label Dec 17, 2024
@phoevos phoevos requested a review from baixiac December 17, 2024 15:57
Copy link
Member

@baixiac baixiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. Given the user could now send in any arbitrary strings, can this also verify if track_id is a valid UUID and fail earlier if not?

@phoevos
Copy link
Member Author

phoevos commented Dec 18, 2024

@baixiac I'm thinking about how we should handle this. I see that the UUIDs MLflow creates (e.g. see here) are converted to their hex format when used as run IDs. But there's no reason not to support the standard format used in the Gateway as well, e.g. "123e4567-e89b-12d3-a456-426614174000". In fact, MLflow's validation for user-provided ID strings is much more lenient, it doesn't really require them to be UUIDs (see here). WDYT? I agree we should have some sort of validation there, but should we actually enforce that the inputs are valid UUIDs (we can support both formats) or should we just check that they're alphanumeric strings between 1-256 characters like MLflow does?

Validate the tracking ID in the API endpoints that require it, ensuring
it's an alphanumeric string of length 1-256. The implementation and
tests are based on MLflow's internal run ID validation:
https://github.com/mlflow/mlflow/blob/92a1664ddbd7ef59f8db45e988e41437d179c3b1/mlflow/utils/validation.py#L374-L377

Signed-off-by: Phoevos Kalemkeris <[email protected]>
@baixiac
Copy link
Member

baixiac commented Dec 18, 2024

Hmm... I have no strong opinion on this. As long as the IDs are consistent across CGW and CMS, that should be fine, whether their format is strict or not. Feel free to make it cover all mlflow ID formats and at the end of the day those are from a 3rd party tool and may change without notice.

@phoevos
Copy link
Member Author

phoevos commented Dec 18, 2024

Great, I did it the MLflow way, allowing alphanumeric strings of length 1-256, PTAL!

Copy link
Member

@baixiac baixiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phoevos phoevos merged commit 5d94c50 into master Dec 18, 2024
5 checks passed
@phoevos phoevos deleted the feature-phoevos-api-tracking-id branch December 18, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants