diff --git a/0.2/404.html b/0.2/404.html new file mode 100644 index 0000000..2f3f287 --- /dev/null +++ b/0.2/404.html @@ -0,0 +1,848 @@ + + + +
+ + + + + + + + + + + + + + +The following diagram shows the architecture of GPUStack:
+ +The GPUStack server consists of the following components:
+GPUStack workers are responsible for:
+The GPUStack server connects to a SQL database as the datastore. Currently, GPUStack uses SQLite. Stay tuned for support for external databases like PostgreSQL in upcoming releases.
+Inference servers are the backends that performs the inference tasks. GPUStack uses llama-box as the inference server.
+The RPC server enables running llama-box backend on a remote host. The Inference Server communicates with one or several instances of RPC server, offloading computations to these remote hosts. This setup allows for distributed LLM inference across multiple workers, enabling the system to load larger models even when individual resources are limited.
+ + + + + + + + + + + + + +