You're not viewing the latest version. Click here to go to latest.
logo
GPUStack
Home
Initializing search
    GitHub
    • Home
    • Inference Performance Lab
    GitHub
      • Overview
      • Quickstart
        • Requirements
        • NVIDIA
        • AMD
        • Ascend
        • Hygon
        • MThreads
        • Iluvatar
        • Cambricon
        • Air-Gapped Installation
        • Installation via Docker Compose
        • Uninstallation
      • Upgrade
      • Migration
        • Playground
          • Chat
          • Image
          • Audio
          • Embedding
          • Rerank
        • Model Catalog
        • Model Deployment Management
        • Inference Backend Management
        • Built-in Inference Backends
        • Compatibility Check
        • Model File management
        • Cluster Management
        • Cloud Credential Management
        • OpenAI Compatible APIs
        • Image Generation APIs
        • Rerank API
        • API Key Management
        • User Management
        • Single Sign-On (SSO) Authentication
        • Observability
        • Using Large Language Models
        • Using Vision Language Models
        • Using Embedding Models
        • Using Reranker Models
        • Using Image Generation Models
        • Recommended Parameters for Image Generation Models
        • Editing Images
        • Using Audio Models
        • Running DeepSeek R1 671B with Distributed vLLM
        • Running DeepSeek R1 671B with Distributed Ascend Mindie
        • Inference On CPUs
        • Inference with Tool Calling
        • Using Custom Inference Backend
        • Adding a GPU Cluster Using DigitalOcean
        • Adding a GPU Cluster Using Kubernetes
        • OpenAI Compatible APIs
        • Integrate with Dify
        • Integrate with RAGFlow
        • Integrate with CherryStudio
      • Architecture
      • Scheduler
      • Troubleshooting
      • FAQ
      • API Reference
        • Start
        • Download Tools
        • Reload Config
        • List Images
        • Save Images
        • Copy Images
      • Overview
          • A100
          • H100
          • A100
          • H100
          • H200
          • A100
          • H100
          • A100
          • H100
          • H200
          • 910B
          • A100
          • H100
          • A100
          • H100
          • 910B
          • A100
          • H100
          • H100
        • The Impact of Quantization on vLLM Inference Performance
        • Evaluating LMCache Prefill Acceleration in vLLM

    Home