Skip to content

Commit

Permalink
Improve logic around MLModel.ready flag (SeldonIO#1074)
Browse files Browse the repository at this point in the history
  • Loading branch information
Adrian Gonzalez-Martin authored Apr 6, 2023
1 parent bcf2c35 commit a1969bf
Show file tree
Hide file tree
Showing 38 changed files with 102 additions and 133 deletions.
3 changes: 1 addition & 2 deletions docs/examples/custom/README.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,7 @@
"\n",
" self._predictive = Predictive(self._model, self._samples)\n",
"\n",
" self.ready = True\n",
" return self.ready\n",
" return True\n",
"\n",
" @decode_args\n",
" async def predict(\n",
Expand Down
33 changes: 9 additions & 24 deletions docs/examples/custom/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To support this scenario, MLServer makes it really easy to create your own exten

## Overview

In this example, we will train a [`numpyro` model](http://num.pyro.ai/en/stable/).
In this example, we will train a [`numpyro` model](http://num.pyro.ai/en/stable/).
The `numpyro` library streamlines the implementation of probabilistic models, abstracting away advanced inference and training algorithms.

Out of the box, `mlserver` doesn't provide an inference runtime for `numpyro`.
Expand All @@ -19,7 +19,6 @@ This will be a very simple bayesian regression model, based on an example provid

Since this is a probabilistic model, during training we will compute an approximation to the posterior distribution of our model using MCMC.


```python
# Original source code and more details can be found in:
# https://nbviewer.jupyter.org/github/pyro-ppl/numpyro/blob/master/notebooks/source/bayesian_regression.ipynb
Expand Down Expand Up @@ -79,7 +78,6 @@ Note that, since this is a probabilistic model, we will only need to save the tr

This will get saved in a `numpyro-divorce.json` file.


```python
import json

Expand All @@ -95,7 +93,7 @@ with open(model_file_name, "w") as model_file:

## Serving

The next step will be to serve our model using `mlserver`.
The next step will be to serve our model using `mlserver`.
For that, we will first implement an extension which serve as the _runtime_ to perform inference using our custom `numpyro` model.

### Custom inference runtime
Expand All @@ -105,8 +103,6 @@ Our custom inference wrapper should be responsible of:
- Loading the model from the set samples we saved previously.
- Running inference using our model structure, and the posterior approximated from the samples.



```python
# %load models.py
import json
Expand Down Expand Up @@ -134,8 +130,7 @@ class NumpyroModel(MLModel):

self._predictive = Predictive(self._model, self._samples)

self.ready = True
return self.ready
return True

@decode_args
async def predict(
Expand Down Expand Up @@ -170,14 +165,13 @@ class NumpyroModel(MLModel):

### Settings files

The next step will be to create 2 configuration files:
The next step will be to create 2 configuration files:

- `settings.json`: holds the configuration of our server (e.g. ports, log level, etc.).
- `model-settings.json`: holds the configuration of our model (e.g. input type, runtime to use, etc.).

#### `settings.json`


```python
# %load settings.json
{
Expand All @@ -188,7 +182,6 @@ The next step will be to create 2 configuration files:

#### `model-settings.json`


```python
# %load model-settings.json
{
Expand All @@ -213,13 +206,11 @@ Since this command will start the server and block the terminal, waiting for req

### Send test inference request


We now have our model being served by `mlserver`.
To make sure that everything is working as expected, let's send a request from our test set.

For that, we can use the Python types that `mlserver` provides out of box, or we can build our request manually.


```python
import requests
import numpy as np
Expand All @@ -245,10 +236,9 @@ response.json()
Now that we have written and tested our custom model, the next step is to deploy it.
With that goal in mind, the rough outline of steps will be to first build a custom image containing our code, and then deploy it.


### Specifying requirements
MLServer will automatically find your requirements.txt file and install necessary python packages

MLServer will automatically find your requirements.txt file and install necessary python packages

```python
# %load requirements.txt
Expand All @@ -262,15 +252,13 @@ jaxlib==0.3.7
### Building a custom image

```{note}
This section expects that Docker is available and running in the background.
This section expects that Docker is available and running in the background.
```

MLServer offers helpers to build a custom Docker image containing your code.
In this example, we will use the `mlserver build` subcommand to create an image, which we'll be able to deploy later.


Note that this section expects that Docker is available and running in the background, as well as a functional cluster with Seldon Core installed and some familiarity with `kubectl`.

Note that this section expects that Docker is available and running in the background, as well as a functional cluster with Seldon Core installed and some familiarity with `kubectl`.

```bash
%%bash
Expand All @@ -283,7 +271,6 @@ To ensure that the image is fully functional, we can spin up a container and the
docker run -it --rm -p 8080:8080 my-custom-numpyro-server:0.1.0
```


```python
import numpy as np

Expand All @@ -308,21 +295,20 @@ As we should be able to see, the server running within our Docker image responds
### Deploying our custom image

```{note}
This section expects access to a functional Kubernetes cluster with Seldon Core installed and some familiarity with `kubectl`.
This section expects access to a functional Kubernetes cluster with Seldon Core installed and some familiarity with `kubectl`.
```

Now that we've built a custom image and verified that it works as expected, we can move to the next step and deploy it.
There is a large number of tools out there to deploy images.
However, for our example, we will focus on deploying it to a cluster running [Seldon Core](https://docs.seldon.io/projects/seldon-core/en/latest/).

```{note}
Also consider that depending on your Kubernetes installation Seldon Core might expect to get the container image from a public container registry like [Docker hub](https://hub.docker.com/) or [Google Container Registry](https://cloud.google.com/container-registry). For that you need to do an extra step of pushing the container to the registry using `docker tag <image name> <container registry>/<image name>` and `docker push <container registry>/<image name>` and also updating the `image` section of the yaml file to `<container registry>/<image name>`.
Also consider that depending on your Kubernetes installation Seldon Core might expect to get the container image from a public container registry like [Docker hub](https://hub.docker.com/) or [Google Container Registry](https://cloud.google.com/container-registry). For that you need to do an extra step of pushing the container to the registry using `docker tag <image name> <container registry>/<image name>` and `docker push <container registry>/<image name>` and also updating the `image` section of the yaml file to `<container registry>/<image name>`.
```

For that, we will need to create a `SeldonDeployment` resource which instructs Seldon Core to deploy a model embedded within our custom image and compliant with the [V2 Inference Protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).
This can be achieved by _applying_ (i.e. `kubectl apply`) a `SeldonDeployment` manifest to the cluster, similar to the one below:


```python
%%writefile seldondeployment.yaml
apiVersion: machinelearning.seldon.io/v1
Expand All @@ -343,7 +329,6 @@ spec:
image: my-custom-numpyro-server:0.1.0
```


```python

```
3 changes: 1 addition & 2 deletions docs/examples/custom/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,7 @@ async def load(self) -> bool:

self._predictive = Predictive(self._model, self._samples)

self.ready = True
return self.ready
return True

@decode_args
async def predict(
Expand Down
10 changes: 4 additions & 6 deletions docs/user-guide/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ and then overriding those methods with your custom logic.

```{code-block} python
---
emphasize-lines: 7-8, 13-14
emphasize-lines: 7-8, 12-13
---
from mlserver import MLModel
from mlserver.types import InferenceRequest, InferenceResponse
Expand All @@ -39,8 +39,7 @@ class MyCustomRuntime(MLModel):
async def load(self) -> bool:
# TODO: Replace for custom logic to load a model artifact
self._model = load_my_custom_model()
self.ready = True
return self.ready
return True
async def predict(self, payload: InferenceRequest) -> InferenceResponse:
# TODO: Replace for custom logic to run inference
Expand Down Expand Up @@ -86,7 +85,7 @@ following custom runtime:

```{code-block} python
---
emphasize-lines: 2, 12-13
emphasize-lines: 2, 11-12
---
from mlserver import MLModel
from mlserver.codecs import decode_args
Expand All @@ -96,8 +95,7 @@ class MyCustomRuntime(MLModel):
async def load(self) -> bool:
# TODO: Replace for custom logic to load a model artifact
self._model = load_my_custom_model()
self.ready = True
return self.ready
return True
@decode_args
async def predict(self, questions: List[str], context: List[str]) -> np.ndarray:
Expand Down
5 changes: 2 additions & 3 deletions docs/user-guide/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Custom metrics will generally be registered in the {func}`load()

```{code-block} python
---
emphasize-lines: 1, 8, 13
emphasize-lines: 1, 8, 12
---
import mlserver
Expand All @@ -81,8 +81,7 @@ class MyCustomRuntime(mlserver.MLModel):
async def load(self) -> bool:
self._model = load_my_custom_model()
mlserver.register("my_custom_metric", "This is a custom metric example")
self.ready = True
return self.ready
return True
async def predict(self, payload: InferenceRequest) -> InferenceResponse:
mlserver.log(my_custom_metric=34)
Expand Down
9 changes: 9 additions & 0 deletions mlserver/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,15 @@ def __init__(self, name: str, version: Optional[str] = None):
super().__init__(msg, status.HTTP_404_NOT_FOUND)


class ModelNotReady(MLServerError):
def __init__(self, name: str, version: Optional[str] = None):
msg = f"Model {name} is not ready yet."
if version is not None:
msg = f"Model {name} with version {version} is not ready yet."

super().__init__(msg, status.HTTP_400_BAD_REQUEST)


class InferenceError(MLServerError):
def __init__(self, msg: str):
super().__init__(msg, status.HTTP_400_BAD_REQUEST)
Expand Down
3 changes: 3 additions & 0 deletions mlserver/handlers/dataplane.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
)
from typing import Optional

from ..errors import ModelNotReady
from ..metrics import model_context
from ..settings import Settings
from ..registry import MultiModelRegistry
Expand Down Expand Up @@ -92,6 +93,8 @@ async def infer(
payload.id = generate_uuid()

model = await self._model_registry.get_model(name, version)
if not model.ready:
raise ModelNotReady(name, version)

self._inference_middleware.request_middleware(payload, model.settings)

Expand Down
3 changes: 1 addition & 2 deletions mlserver/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,7 @@ async def load(self) -> bool:
**This method should be overriden to implement your custom load
logic.**
"""
self.ready = True
return self.ready
return True

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
"""
Expand Down
4 changes: 2 additions & 2 deletions mlserver/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ async def _load_model(self, model: MLModel):

# Register model again to ensure we save version modified by hooks
self._register(model)
await model.load()
model.ready = await model.load()

logger.info(f"Loaded model '{model.name}' succesfully.")
except Exception:
Expand All @@ -180,7 +180,7 @@ async def _reload_model(self, old_model: MLModel, new_model: MLModel):
# Loading the model before unloading the old one - this will ensure
# that at least one is available (sort of mimicking a rolling
# deployment)
await new_model.load()
new_model.ready = await new_model.load()
self._register(new_model)

if old_model == self.default:
Expand Down
3 changes: 1 addition & 2 deletions runtimes/alibi-detect/mlserver_alibi_detect/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,7 @@ async def load(self) -> bool:
f"Invalid configuration for model {self._settings.name}: {e}"
) from e

self.ready = True
return self.ready
return True

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
# If batch is not configured, run the detector and return the output
Expand Down
4 changes: 2 additions & 2 deletions runtimes/alibi-detect/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ async def outlier_detector(
outlier_detector_settings: ModelSettings,
) -> AlibiDetectRuntime:
model = AlibiDetectRuntime(outlier_detector_settings)
await model.load()
model.ready = await model.load()

return model

Expand Down Expand Up @@ -140,6 +140,6 @@ def drift_detector_uri(tmp_path: str) -> str:
@pytest.fixture
async def drift_detector(drift_detector_settings: ModelSettings) -> AlibiDetectRuntime:
model = AlibiDetectRuntime(drift_detector_settings)
await model.load()
model.ready = await model.load()

return model
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,7 @@ async def load(self) -> bool:
else:
self._model = await self._load_from_uri(self._infer_impl)

self.ready = True
return self.ready
return True

def _explain_impl(self, input_data: Any, explain_parameters: Dict) -> Explanation:
if not self.alibi_explain_settings.explainer_batch:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ async def load(self) -> bool:
else:
self._model = await self._load_from_uri(self._inference_model)

self.ready = True
return self.ready
return True

async def _get_inference_model(self) -> Any:
raise NotImplementedError
3 changes: 1 addition & 2 deletions runtimes/alibi-explain/tests/helpers/tf_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@ async def predict(self, payload: InferenceRequest) -> InferenceResponse:

async def load(self) -> bool:
self._model = tf.keras.models.load_model(get_tf_mnist_model_uri())
self.ready = True
return self.ready
return True


def _train_tf_mnist() -> None:
Expand Down
3 changes: 1 addition & 2 deletions runtimes/huggingface/mlserver_huggingface/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,7 @@ async def load(self) -> bool:
self._model = load_pipeline_from_settings(self.hf_settings, self.settings)
self._merge_metadata()
print("model has been loaded!")
self.ready = True
return self.ready
return True

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
"""
Expand Down
2 changes: 1 addition & 1 deletion runtimes/huggingface/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def model_settings() -> ModelSettings:
@pytest.fixture(scope="module")
async def runtime(model_settings: ModelSettings) -> HuggingFaceRuntime:
runtime = HuggingFaceRuntime(model_settings)
await runtime.load()
runtime.ready = await runtime.load()
return runtime


Expand Down
3 changes: 1 addition & 2 deletions runtimes/lightgbm/mlserver_lightgbm/lightgbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ async def load(self) -> bool:

self._model = lgb.Booster(model_file=model_uri)

self.ready = True
return self.ready
return True

async def predict(self, payload: types.InferenceRequest) -> types.InferenceResponse:
decoded = self.decode_request(payload, default_codec=NumpyRequestCodec)
Expand Down
2 changes: 1 addition & 1 deletion runtimes/lightgbm/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def model_settings(model_uri: str) -> ModelSettings:
@pytest.fixture
async def model(model_settings: ModelSettings) -> LightGBMModel:
model = LightGBMModel(model_settings)
await model.load()
model.ready = await model.load()

return model

Expand Down
Loading

0 comments on commit a1969bf

Please sign in to comment.