Architecture
The following diagram shows the architecture of GPUStack:
Server
The GPUStack server consists of the following components:
- API Server: Provides a RESTful interface for clients to interact with the system. It handles authentication and authorization.
- Scheduler: Responsible for assigning model instances to workers.
- Model Controller: Manages the rollout and scaling of model instances to match the desired model replicas.
- HTTP Proxy: Routes completion API requests to backend inference servers.
Worker
GPUStack workers are responsible for:
- Running inference servers for model instances assigned to the worker.
- Reporting status to the server.
SQL Database
The GPUStack server connects to a SQL database as the datastore. Currently, GPUStack uses SQLite. Stay tuned for support for external databases like PostgreSQL in upcoming releases.
Inference Server
Inference servers are the backends that performs the inference tasks. GPUStack uses llama-box as the inference server.