Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for oras-based model distribution tool in Cloud Native #1590

Open
1 task
caozhuozi opened this issue Dec 17, 2024 · 1 comment
Open
1 task

Request for oras-based model distribution tool in Cloud Native #1590

caozhuozi opened this issue Dec 17, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request triage New issues or PRs to be acknowledged by maintainers

Comments

@caozhuozi
Copy link

caozhuozi commented Dec 17, 2024

What is the version of your ORAS CLI

not related

What would you like to be added?

Let me first provide some background. We are developing an AI/ML platform that runs on Kubernetes for our company. The primary function of this platform is to register and serve models.

As an AI/ML platform rather than a storage, artifacts, or registry infrastructure, we want to avoid managing artifact storage directly. Instead, we focus solely on maintaining model metadata, such as format, which allows us to locate the appropriate inference server to serve the model.

We have used three methods for registering models in our platform:

  • User-specified GCS bucket: This method often encounters authorization issues. Users must manually grant permissions to our platform's service account.
  • User-specified Git repository: Models are pulled or pushed using Git LFS. However, models larger than 4GB need to be explicitly split due to LFS restrictions, and before serving, we must reconstruct the model, complicating the process.
  • Model registration through our UI: This approach is notably slow and prone to failures for models over 1GB since we are use HTTP multipart uploads. We then handle uploading to our GCS bucket, but we prefer not to deal directly with large files since we only want to manage only model metadata.

Besides, all three methods mentioned above require us to set up init containers to pull or download the models and mount them to the inference server container. With Kubernetes v1.31 now supporting the direct mounting of OCI images, storing models directly in the container registry would be much more convenient in the future.

Another inconvenience we've found in our experience is the separation between the model itself and its metadata. For instance, when users register models through our UI, they specify the model URI (which points to the model file itself), but they must also manually input additional model data into our platform. This creates a disconnect, as managing the model file and its metadata separately can be cumbersome and error-prone.

Then, I discovered this talk: OCI as a Standard for ML Artifact Storage and Retrieval. which perfectly aligns with our needs. I think this will be a game changer.

User can build/push their model artifacts by leveraging container registry, similar to how they build/push images.

Upon registering a model in our platform, users would simply input the artifact URI, allowing us to fetch and read the metadata from the artifact without needing to handle the large model file itself.

However, I believe it would be advantageous if there were a standard or specification for this model building and distribution process.

Why is this needed for ORAS?

This tool could be part of oras echosystem. Such a standard would simplify both AI model distribution and serving side in cloud-native environments.

Are you willing to submit PRs to contribute to this feature?

  • Yes, I am willing to implement it.
@caozhuozi caozhuozi added enhancement New feature or request triage New issues or PRs to be acknowledged by maintainers labels Dec 17, 2024
@caozhuozi
Copy link
Author

/assign @FeynmanZhou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage New issues or PRs to be acknowledged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants