GPUStack
GPUStack is an open-source GPU cluster manager for running AI models.
Key Features
- Broad Hardware Compatibility: Run with different brands of GPUs in Apple Macs, Windows PCs, and Linux servers.
- Broad Model Support: From LLMs to diffusion models, audio, embedding, and reranker models.
- Scales with Your GPU Inventory: Easily add more GPUs or nodes to scale up your operations.
- Distributed Inference: Supports both single-node multi-GPU and multi-node inference and serving.
- Multiple Inference Backends: Supports llama-box (llama.cpp & stable-diffusion.cpp), vox-box and vLLM as the inference backends.
- Lightweight Python Package: Minimal dependencies and operational overhead.
- OpenAI-compatible APIs: Serve APIs that are compatible with OpenAI standards.
- User and API key management: Simplified management of users and API keys.
- GPU metrics monitoring: Monitor GPU performance and utilization in real-time.
- Token usage and rate metrics: Track token usage and manage rate limits effectively.
Supported Platforms
- macOS
- Windows
- Linux
Supported Accelerators
- Apple Metal (M-series chips)
- NVIDIA CUDA (Compute Capability 6.0 and above)
- Ascend CANN
- Moore Threads MUSA
- AMD ROCm
We plan to support the following accelerators in future releases.
- Intel oneAPI
- Qualcomm AI Engine
Supported Models
GPUStack uses llama-box (bundled llama.cpp and stable-diffusion.cpp server), vLLM and vox-box as the backends and supports a wide range of models. Models from the following sources are supported:
-
Local File Path
Example Models:
Category | Models |
---|---|
Large Language Models(LLMs) | Qwen, LLaMA, Mistral, Deepseek, Phi, Yi |
Vision Language Models(VLMs) | Llama3.2-Vision, Pixtral , Qwen2-VL, LLaVA, InternVL2 |
Diffusion Models | Stable Diffusion, FLUX |
Rerankers | GTE, BCE, BGE, Jina |
Audio Models | Whisper (speech-to-text), CosyVoice (text-to-speech) |
For full list of supported models, please refer to the supported models section in the inference backends documentation.
OpenAI-Compatible APIs
GPUStack serves OpenAI compatible APIs. For details, please refer to OpenAI Compatible APIs