GPUStack
Home
Initializing search
GitHub
Home
Inference Performance Lab
GPUStack
GitHub
Home
Home
Overview
Quickstart
Installation
Installation
Requirements
NVIDIA
AMD
Ascend
Hygon
MThreads
Iluvatar
Cambricon
Air-Gapped Installation
Installation via Docker Compose
Uninstallation
Upgrade
Migration
User Guide
User Guide
Playground
Playground
Chat
Image
Audio
Embedding
Rerank
Model Catalog
Model Deployment Management
Inference Backend Management
Built-in Inference Backends
Compatibility Check
Model File management
Cluster Management
Cloud Credential Management
OpenAI Compatible APIs
Image Generation APIs
Rerank API
API Key Management
User Management
Single Sign-On (SSO) Authentication
Observability
Using Models
Using Models
Using Large Language Models
Using Vision Language Models
Using Embedding Models
Using Reranker Models
Using Image Generation Models
Recommended Parameters for Image Generation Models
Editing Images
Using Audio Models
Tutorials
Tutorials
Running DeepSeek R1 671B with Distributed vLLM
Running DeepSeek R1 671B with Distributed Ascend Mindie
Inference On CPUs
Inference with Tool Calling
Using Custom Inference Backend
Adding a GPU Cluster Using DigitalOcean
Adding a GPU Cluster Using Kubernetes
Integrations
Integrations
OpenAI Compatible APIs
Integrate with Dify
Integrate with RAGFlow
Integrate with CherryStudio
Architecture
Scheduler
Troubleshooting
FAQ
API Reference
CLI Reference
CLI Reference
Start
Download Tools
Reload Config
List Images
Save Images
Copy Images
Inference Performance Lab
Inference Performance Lab
Overview
Optimizing Throughput
Optimizing Throughput
GLM-4.6/4.5-Air
GLM-4.6/4.5-Air
A100
H100
GLM-4.6/4.5
GLM-4.6/4.5
A100
H100
H200
GPT-OSS-20B
GPT-OSS-20B
A100
H100
GPT-OSS-120B
GPT-OSS-120B
A100
H100
DeepSeek-R1
DeepSeek-R1
H200
Qwen3-8B
Qwen3-8B
910B
Qwen3-14B
Qwen3-14B
A100
H100
Qwen3-32B
Qwen3-32B
A100
H100
Qwen3-30B-A3B
Qwen3-30B-A3B
910B
Qwen3-235B-A22B
Qwen3-235B-A22B
A100
H100
Optimizing Latency
Optimizing Latency
Qwen3-8B
Qwen3-8B
H100
References
References
The Impact of Quantization on vLLM Inference Performance
Evaluating LMCache Prefill Acceleration in vLLM
Home