Skip to content

You're not viewing the latest version. Click here to go to latest.

GPUStack

Integrate with Cherry Studio

Initializing search

gpustack/gpustack

Home
Inference Performance Lab
Container Image Selector

GPUStack

gpustack/gpustack

Home
Home
- Overview
- Quickstart
- Installation
  Installation
- Upgrade
  Upgrade
- Migration
- User Guide
  User Guide
  - Playground
    
    Playground
    
    Chat
    
    Image
    
    Audio
    
    Embedding
    
    Rerank
  - Model Catalog
  - Model Deployment Management
  - Model Route Management
  - Model Provider Management
  - Inference Backend Management
  - Built-in Inference Backends
  - Compatibility Check
  - Model File management
  - Cluster Management
  - GPU Service
    GPU Service
    
    GPU Instances
    
    Instance Types
    
    Instance Templates
    
    Storage
    
    Storage Types
    
    SSH Public Keys
  - Cloud Credential Management
  - API Key Management
  - User Management
  - Usage
  - Single Sign-On (SSO) Authentication
  - Observability
  - Benchmarking
- Using Models
  Using Models
- Tutorials
  Tutorials
- Integrations
  Integrations
  - Inference APIs
  - Integrate with Claude Code
  - Integrate with Dify
  - Integrate with RAGFlow
  - Integrate with Cherry Studio Integrate with Cherry Studio
    Table of contents
    
    Deploying Models
    
    Create an API Key
    
    Integrating GPUStack into Cherry Studio
    
    Using LLMs
    
    Using Multimodal Models
    
    Use Embeddings and Reranking to Improve Knowledge Base Q&A
  - Integrate with OpenClaw
  - Integrate with n8n
  - Integrate with MaxKB
  - Integrate with NewAPI
  - Integrate with LiteLLM
- Architecture
- Scheduler
- Troubleshooting
- FAQ
- API Reference
- CLI Reference
  CLI Reference
- Environment Variables
Inference Performance Lab
Inference Performance Lab
- Overview
- Optimizing Throughput
  Optimizing Throughput
  - Qwen3.5-35B-A3B
    Qwen3.5-35B-A3B
    
    H200
  - Qwen3.5-9B
    Qwen3.5-9B
    
    H100
  - GLM-4.5-Air
    GLM-4.5-Air
    
    A100
    
    H100
  - GLM-4.x
    GLM-4.x
    
    A100
    
    H100
    
    H200
  - GPT-OSS-20B
    GPT-OSS-20B
    
    A100
    
    H100
  - GPT-OSS-120B
    GPT-OSS-120B
    
    A100
    
    H100
  - DeepSeek-R1
    DeepSeek-R1
    
    H200
  - DeepSeek-V3.2
    DeepSeek-V3.2
    
    H200
  - GLM-5.2
    GLM-5.2
    
    H200
  - Qwen3-8B
    Qwen3-8B
    
    910B
  - Qwen3-14B
    Qwen3-14B
    
    A100
    
    H100
  - Qwen3-32B
    Qwen3-32B
    
    A100
    
    H100
  - Qwen3-30B-A3B
    Qwen3-30B-A3B
    
    910B
  - Qwen3-235B-A22B
    Qwen3-235B-A22B
    
    A100
    
    H100
- Optimizing Latency
  Optimizing Latency
  - Qwen3.5-35B-A3B
    Qwen3.5-35B-A3B
    
    H200
  - Qwen3.5-9B
    Qwen3.5-9B
    
    H100
  - Qwen3-8B
    Qwen3-8B
    
    H100
- References
  References
  - The Impact of Quantization on vLLM Inference Performance
  - Evaluating LMCache Prefill Acceleration in vLLM
Container Image Selector

Table of contents

Deploying Models
Create an API Key
Integrating GPUStack into Cherry Studio
Using LLMs
Using Multimodal Models
Use Embeddings and Reranking to Improve Knowledge Base Q&A

Integrate with Cherry Studio

Cherry Studio integrates with GPUStack to leverage locally hosted LLMs, embeddings, and reranking capabilities.

Deploying Models

In GPUStack UI, navigate to the Deployments page and click on Deploy Model to deploy the models you need. Here are some example models:
- qwen3-instruct-2507
- qwen2.5-vl-7b
- bge-m3
- bge-reranker-v2-m3

In the model’s Operations, open API Access Info to see how to integrate with this model:

Create an API Key

Navigate to the Access Control > API Keys page, then click on New API Key.
Fill in the name, then click Save.
Copy the API key and save it for later use.

Integrating GPUStack into Cherry Studio

Open Cherry Studio, go to Settings → Model Provider, find GPUStack, enable it, and configure it as shown:
- API Key: Input the API key you copied from previous steps.
- API Host: Access URL in the API Access Info panel.

In the GPUStack provider configuration, click "Manage" and enable the models you need:

(Optional) Test the API:

After configuration, return to the Cherry Studio home page and start using your models.

Using LLMs

Using Multimodal Models

Select a multimodal model:

Ask multimodal questions:

Use Embeddings and Reranking to Improve Knowledge Base Q&A

Open the Knowledge Base configuration page:

Add a knowledge base:

Add content to the knowledge base (using “Notes” as an example):

Return to the home page and use knowledge base Q&A:

Integrate with RAGFlow

Integrate with OpenClaw

Copyright © 2026 GPUStack.ai