The Component Repository is a supplementary application to the Modular Inference Server, developed based on the FL Repository enabler, which is a part of the ASSIST-IoT project. It allows for the storage of reusable components for ML inference, such as ML models, services, inferencers and data transformations. It is described more in depth in the paper "Flexible Deployment of Machine Learning Inference Pipelines in the Cloud–Edge–IoT Continuum".
The Component Repository has been developed with the assumption that it will be deployed on Kubernetes with a dedicated Helm chart. To do so, just run helm install <deployment name> helm-chart
. To make sure that before that the enabler has been configured properly, check if the values in the repository-configmap
have been properly set (to change them, you can always modify the configmap with kubectl edit cm repository-configmap
and then recreate the Component Repository pod to propagate the changes).
By default, the chart also uses the host's ports 30001
as a Node Port. Other port may also be used, but they will have to be explicitely changed in the values.yaml
file describing the Kubernetes service. You can also set there the specific NodePort you would like to use to reach the Component Repository API by changing the values in the flrepository service.
If you'd like to see and experiment with the API, the recommended approach is to go to the http://127.0.0.1:XXXXX/docs URL, where XXXXX stands for the flrepository service NodePort, and use the Swagger docs generated by the FastAPI framework.
Run docker-compose -f docker-compose.yml up --force-recreate --build -d
in the root of this repository to build a custom image to be used by the Component Repository.
If you want for the MongoDB database on your custom repositorydb image to initialize with some of the preexisting objects already stored in the collections, you can achieve that by:
- Use the API to add and subtract the objects in the database until it has the desired content.
- Use the
kubectl exec -i -t <podname> -- /bin/bash
command to reach the commandline of the repositorydb pod. - Use the
mongodump --archive db.dump
tool with the appropriate options to create the backup file. - Use
kubectl cp <podname>:/db.dump .db.dump
to move the archive file from the pod to the repository. - Move the db.dump file to the mongo_db directory.
- Run the
docker-compose -f docker-compose.yml up --force-recreate --build -d
command to construct the right image.
The models as a collection contain the information about machine learning models ready and available for inference. This means the relevant metadata (the model name and version, which enables the user to effectively distinguish between models; the library, which indicates in which format it was saved in and what library it should be used alongside of; and the description, which contains a couple of words describing the architecture of a potential model). For example:
{
"meta": {
"library": "keras",
"description": "A CNN (Convolutional Neural Network) designed to solve the CIFAR-10 image classification task. Used as a test model for the development of the Keras library."
},
"model_name": "base",
"model_version": "base2",
"model_id": "62aae16f6ee3b61c9c6c2921"
}
The models collection can be manipulating using the following endpoints:
- POST /model Adds the metadata of a new initial model to the library.
- PUT /model/{name}/{version} Depending on whether a model with a given name and version exists in the Component Repository, its object file is created or updated.
- PUT /model/meta/{name}/{version} For the given model name and version its metadata is updated.
- GET /model Return the list encompassing the metadata of all available models.
- GET /model/meta Return the metadata of the model with a given name and version.
- GET /model/{name}/{version} Return the binary file containing the final model weights and structure.
- DELETE /model/{name}/{version} Delete the metadata and binary file of a model with a given name and version.
More information about the construction and upload of new models can be found in the documentation of the FL Local Operations.
The data transformations as a collection contain the information about the available data transformations, which can be used as the preprocessing and postprocessing steps during inference. The modules needed to load the transformations are stored along with their serialized objects. The relevant metadata here contains the unique id of the data transformation, a short description on its purpose, the parameters and parameter types that it expects to obtain, the default values of those parameters, the data types that it outputs and the system needs that it has, like the preinstalled libraries or models, the amount of extra storage that is requires, along with the amount of RAM and the potential availability of a GPU. For example:
{
"id": "custom.preprocess_tensorflow_tensor",
"description": "A transformation that takes a TensorProto as defined in Tensorflow Core and reshapes it based on the size data",
"parameter_types": {},
"default_values": {},
"outputs": ["np.ndarray"],
"needs": {
"storage": 0,
"RAM": 0,
"GPU": false,
"preinstalled_libraries": {"tensorflow": "2.12.0"},
"available_models": {}
}
}
The data transformations collection can be manipulating using the following endpoints:
- POST /transformation Create a new data transformation with the specified metadata.
- PUT /transformation/{id} Update the object file for a given data transformation.
- PUT /transformation/meta/{id} Update the metadata of a given data transformation.
- GET /transformation Get the list with the metadata of all data transformations available in this Component Repository Repository instance.
- GET /transformation/{id} Get the object file of a data transformation with a given id.
- DELETE /transformation/{id} Delete the metadata and the object file of a data transformation with a given id.
More information about how to construct, connect and reuse data transformations can be found in the documentation of the FL Local Operations.
The services as a collection contain the information about the available gRPC services, which can be used for ML inference. The metadata of the services is stored along with their serialized objects. The relevant metadata here contains the unique id of the service, a short description on its purpose, and the system needs that it has, like the preinstalled libraries or models, the amount of extra storage that is requires, along with the amount of RAM and the potential availability of a GPU. For example:
{
"id": "inference_application.code.services.extended_inference_svc",
"description": "A general bidirectional gRPC streaming service that accepts a map of TensorProto and returns a map of TensorProto",
"needs": {
"storage": 0,
"RAM": 0,
"GPU": false,
"preinstalled_libraries": {},
"available_models": {}
}
}
The services collection can be manipulating using the following endpoints:
- POST /service Create a new service with the specified metadata.
- PUT /service/{id} Update the object file for a given service.
- PUT /service/meta/{id} Update the metadata of a given service.
- GET /service Get the list with the metadata of all services available in this Component Repository Repository instance.
- GET /service/{id} Get the object file of a service with a given id.
- DELETE /service/{id} Delete the metadata and the object file of a service with a given id.
The inferencers as a collection contain the information about the available inferencers, which can be used to handle custom ML models. The metadata of the inferencers is stored along with their serialized objects. The relevant metadata here contains the unique id of the inferencer, a short description on its purpose, and the system needs that it has, like the preinstalled libraries or models, the amount of extra storage that is requires, along with the amount of RAM and the potential availability of a GPU. For example:
{
"id": "inference_application.code.inferencers.torch_rcnn_inferencer",
"description": "An inferencer that can load and use an RCNN model in Torch",
"library": "torch",
"use_cuda": true,
"needs": {
"storage": 0,
"RAM": 0,
"GPU": false,
"preinstalled_libraries": {},
"available_models": {}
}
}
The inferencers collection can be manipulating using the following endpoints:
- POST /inferencer Create a new inferencer with the specified metadata.
- PUT /inferencer/{id} Update the object file for a given inferencer.
- PUT /inferencer/meta/{id} Update the metadata of a given inferencer.
- GET /inferencer Get the list with the metadata of all inferencers available in this Component Repository Repository instance.
- GET /inferencer/{id} Get the object file of an inferencer with a given id.
- DELETE /inferencer/{id} Delete the metadata and the object file of an inferencer with a given id.
If you found the Component Repository useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!
Bogacka, K.; Sowiński, P.; Danilenka, A.; Biot, F.M.; Wasielewska-Michniewska, K.; Ganzha, M.; Paprzycki, M.; Palau, C.E.
Flexible Deployment of Machine Learning Inference Pipelines in the Cloud–Edge–IoT Continuum.
Electronics 2024, 13, 1888. https://doi.org/10.3390/electronics13101888
@Article{electronics13101888,
AUTHOR = {Bogacka, Karolina and Sowiński, Piotr and Danilenka, Anastasiya and Biot, Francisco Mahedero and Wasielewska-Michniewska, Katarzyna and Ganzha, Maria and Paprzycki, Marcin and Palau, Carlos E.},
TITLE = {Flexible Deployment of Machine Learning Inference Pipelines in the Cloud–Edge–IoT Continuum},
JOURNAL = {Electronics},
VOLUME = {13},
YEAR = {2024},
NUMBER = {10},
ARTICLE-NUMBER = {1888},
URL = {https://www.mdpi.com/2079-9292/13/10/1888},
ISSN = {2079-9292},
ABSTRACT = {Currently, deploying machine learning workloads in the Cloud–Edge–IoT continuum is challenging due to the wide variety of available hardware platforms, stringent performance requirements, and the heterogeneity of the workloads themselves. To alleviate this, a novel, flexible approach for machine learning inference is introduced, which is suitable for deployment in diverse environments—including edge devices. The proposed solution has a modular design and is compatible with a wide range of user-defined machine learning pipelines. To improve energy efficiency and scalability, a high-performance communication protocol for inference is propounded, along with a scale-out mechanism based on a load balancer. The inference service plugs into the ASSIST-IoT reference architecture, thus taking advantage of its other components. The solution was evaluated in two scenarios closely emulating real-life use cases, with demanding workloads and requirements constituting several different deployment scenarios. The results from the evaluation show that the proposed software meets the high throughput and low latency of inference requirements of the use cases while effectively adapting to the available hardware. The code and documentation, in addition to the data used in the evaluation, were open-sourced to foster adoption of the solution.},
DOI = {10.3390/electronics13101888}
}
The Component Repository is released under the Apache 2.0 license.
As the Component Repository is heavily based on the FL Repository, potentially relevant documentation will be possible to find here.