diff --git a/docs/source/serving/deploying_with_lws.rst b/docs/source/serving/deploying_with_lws.rst new file mode 100644 index 0000000000000..b63a432dde0d5 --- /dev/null +++ b/docs/source/serving/deploying_with_lws.rst @@ -0,0 +1,12 @@ +.. _deploying_with_lws: + +Deploying with LWS +============================ + +LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. +A major use case is for multi-host/multi-node distributed inference. + +vLLM can be deployed with `LWS `_ on Kubernetes for distributed model serving. + +Please see `this guide `_ for more details on +deploying vLLM on Kubernetes using LWS. diff --git a/docs/source/serving/integrations.rst b/docs/source/serving/integrations.rst index 93872397913e3..2066e80b03298 100644 --- a/docs/source/serving/integrations.rst +++ b/docs/source/serving/integrations.rst @@ -8,4 +8,5 @@ Integrations deploying_with_kserve deploying_with_triton deploying_with_bentoml + deploying_with_lws serving_with_langchain