You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let me first provide some background. We are developing an AI/ML platform that runs on Kubernetes for our company. The primary function of this platform is to register and serve models.
As an AI/ML platform rather than a storage, artifacts, or registry infrastructure, we want to avoid managing artifact storage directly. Instead, we focus solely on maintaining model metadata, such as format, which allows us to locate the appropriate inference server to serve the model.
We have used three methods for registering models in our platform:
User-specified GCS bucket: This method often encounters authorization issues. Users must manually grant permissions to our platform's service account.
User-specified Git repository: Models are pulled or pushed using Git LFS. However, models larger than 4GB need to be explicitly split due to LFS restrictions, and before serving, we must reconstruct the model, complicating the process.
Model registration through our UI: This approach is notably slow and prone to failures for models over 1GB since we are use HTTP multipart uploads. We then handle uploading to our GCS bucket, but we prefer not to deal directly with large files since we only want to manage only model metadata.
Besides, all three methods mentioned above require us to set up init containers to pull or download the models and mount them to the inference server container. With Kubernetes v1.31 now supporting the direct mounting of OCI images, storing models directly in the container registry would be much more convenient in the future.
Another inconvenience we've found in our experience is the separation between the model itself and its metadata. For instance, when users register models through our UI, they specify the model URI (which points to the model file itself), but they must also manually input additional model data into our platform. This creates a disconnect, as managing the model file and its metadata separately can be cumbersome and error-prone.
User can build/push their model artifacts by leveraging container registry, similar to how they build/push images.
Upon registering a model in our platform, users would simply input the artifact URI, allowing us to fetch and read the metadata from the artifact without needing to handle the large model file itself.
However, I believe it would be advantageous if there were a standard or specification for this model building and distribution process.
Why is this needed for ORAS?
This tool could be part of oras echosystem. Such a standard would simplify both AI model distribution and serving side in cloud-native environments.
Are you willing to submit PRs to contribute to this feature?
Yes, I am willing to implement it.
The text was updated successfully, but these errors were encountered:
What is the version of your ORAS CLI
not related
What would you like to be added?
Let me first provide some background. We are developing an AI/ML platform that runs on Kubernetes for our company. The primary function of this platform is to register and serve models.
As an AI/ML platform rather than a storage, artifacts, or registry infrastructure, we want to avoid managing artifact storage directly. Instead, we focus solely on maintaining model metadata, such as format, which allows us to locate the appropriate inference server to serve the model.
We have used three methods for registering models in our platform:
Besides, all three methods mentioned above require us to set up init containers to pull or download the models and mount them to the inference server container. With Kubernetes v1.31 now supporting the direct mounting of OCI images, storing models directly in the container registry would be much more convenient in the future.
Another inconvenience we've found in our experience is the separation between the model itself and its metadata. For instance, when users register models through our UI, they specify the model URI (which points to the model file itself), but they must also manually input additional model data into our platform. This creates a disconnect, as managing the model file and its metadata separately can be cumbersome and error-prone.
Then, I discovered this talk: OCI as a Standard for ML Artifact Storage and Retrieval. which perfectly aligns with our needs. I think this will be a game changer.
User can build/push their model artifacts by leveraging container registry, similar to how they build/push images.
Upon registering a model in our platform, users would simply input the artifact URI, allowing us to fetch and read the metadata from the artifact without needing to handle the large model file itself.
However, I believe it would be advantageous if there were a standard or specification for this model building and distribution process.
Why is this needed for ORAS?
This tool could be part of oras echosystem. Such a standard would simplify both AI model distribution and serving side in cloud-native environments.
Are you willing to submit PRs to contribute to this feature?
The text was updated successfully, but these errors were encountered: