Request for oras-based model distribution tool in Cloud Native #1590

caozhuozi · 2024-12-17T04:09:17Z

What is the version of your ORAS CLI

not related

What would you like to be added?

Let me first provide some background. We are developing an AI/ML platform that runs on Kubernetes for our company. The primary function of this platform is to register and serve models.

As an AI/ML platform rather than a storage, artifacts, or registry infrastructure, we want to avoid managing artifact storage directly. Instead, we focus solely on maintaining model metadata, such as format, which allows us to locate the appropriate inference server to serve the model.

We have used three methods for registering models in our platform:

User-specified GCS bucket: This method often encounters authorization issues. Users must manually grant permissions to our platform's service account.
User-specified Git repository: Models are pulled or pushed using Git LFS. However, models larger than 4GB need to be explicitly split due to LFS restrictions, and before serving, we must reconstruct the model, complicating the process.
Model registration through our UI: This approach is notably slow and prone to failures for models over 1GB since we are use HTTP multipart uploads. We then handle uploading to our GCS bucket, but we prefer not to deal directly with large files since we only want to manage only model metadata.

Besides, all three methods mentioned above require us to set up init containers to pull or download the models and mount them to the inference server container. With Kubernetes v1.31 now supporting the direct mounting of OCI images, storing models directly in the container registry would be much more convenient in the future.

Another inconvenience we've found in our experience is the separation between the model itself and its metadata. For instance, when users register models through our UI, they specify the model URI (which points to the model file itself), but they must also manually input additional model data into our platform. This creates a disconnect, as managing the model file and its metadata separately can be cumbersome and error-prone.

Then, I discovered this talk: OCI as a Standard for ML Artifact Storage and Retrieval. which perfectly aligns with our needs. I think this will be a game changer.

User can build/push their model artifacts by leveraging container registry, similar to how they build/push images.

Upon registering a model in our platform, users would simply input the artifact URI, allowing us to fetch and read the metadata from the artifact without needing to handle the large model file itself.

However, I believe it would be advantageous if there were a standard or specification for this model building and distribution process.

Why is this needed for ORAS?

This tool could be part of oras echosystem. Such a standard would simplify both AI model distribution and serving side in cloud-native environments.

Are you willing to submit PRs to contribute to this feature?

Yes, I am willing to implement it.

caozhuozi · 2024-12-17T04:11:03Z

/assign @FeynmanZhou

caozhuozi added enhancement New feature or request triage New issues or PRs to be acknowledged by maintainers labels Dec 17, 2024

shizhMSFT assigned FeynmanZhou Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for oras-based model distribution tool in Cloud Native #1590

Request for oras-based model distribution tool in Cloud Native #1590

caozhuozi commented Dec 17, 2024 •

edited

Loading

caozhuozi commented Dec 17, 2024

Request for oras-based model distribution tool in Cloud Native #1590

Request for oras-based model distribution tool in Cloud Native #1590

Comments

caozhuozi commented Dec 17, 2024 • edited Loading

What is the version of your ORAS CLI

What would you like to be added?

Why is this needed for ORAS?

Are you willing to submit PRs to contribute to this feature?

caozhuozi commented Dec 17, 2024

caozhuozi commented Dec 17, 2024 •

edited

Loading