Skip to content

You're not viewing the latest version. Click here to go to latest.

GPUStack

Integrate with CherryStudio

Initializing search

gpustack/gpustack

Home
Inference Performance Lab
Container Image Selector

GPUStack

gpustack/gpustack

Home
Home
- Overview
- Quickstart
- Installation
  Installation
- Upgrade
- Migration
- User Guide
  User Guide
- Using Models
  Using Models
- Tutorials
  Tutorials
- Integrations
  Integrations
  - Inference APIs
  - Integrate with Claude Code
  - Integrate with Dify
  - Integrate with RAGFlow
  - Integrate with CherryStudio Integrate with CherryStudio
    Table of contents
    
    Deploying Models
    
    Create an API Key
    
    Integrating GPUStack into CherryStudio
    
    Using LLMs
    
    Using Multimodal Models
    
    Use Embeddings and Reranking to Improve Knowledge Base Q&A
  - Integrate with OpenClaw
  - Integrate with n8n
- Architecture
- Scheduler
- Troubleshooting
- FAQ
- API Reference
- CLI Reference
  CLI Reference
- Environment Variables
Inference Performance Lab
Inference Performance Lab
- Overview
- Optimizing Throughput
  Optimizing Throughput
  - Qwen3.5-35B-A3B
    Qwen3.5-35B-A3B
    
    H200
  - Qwen3.5-9B
    Qwen3.5-9B
    
    H100
  - GLM-4.5-Air
    GLM-4.5-Air
    
    A100
    
    H100
  - GLM-4.x
    GLM-4.x
    
    A100
    
    H100
    
    H200
  - GPT-OSS-20B
    GPT-OSS-20B
    
    A100
    
    H100
  - GPT-OSS-120B
    GPT-OSS-120B
    
    A100
    
    H100
  - DeepSeek-R1
    DeepSeek-R1
    
    H200
  - DeepSeek-V3.2
    DeepSeek-V3.2
    
    H200
  - Qwen3-8B
    Qwen3-8B
    
    910B
  - Qwen3-14B
    Qwen3-14B
    
    A100
    
    H100
  - Qwen3-32B
    Qwen3-32B
    
    A100
    
    H100
  - Qwen3-30B-A3B
    Qwen3-30B-A3B
    
    910B
  - Qwen3-235B-A22B
    Qwen3-235B-A22B
    
    A100
    
    H100
- Optimizing Latency
  Optimizing Latency
  - Qwen3.5-35B-A3B
    Qwen3.5-35B-A3B
    
    H200
  - Qwen3.5-9B
    Qwen3.5-9B
    
    H100
  - Qwen3-8B
    Qwen3-8B
    
    H100
- References
  References
  - The Impact of Quantization on vLLM Inference Performance
  - Evaluating LMCache Prefill Acceleration in vLLM
Container Image Selector

Table of contents

Deploying Models
Create an API Key
Integrating GPUStack into CherryStudio
Using LLMs
Using Multimodal Models
Use Embeddings and Reranking to Improve Knowledge Base Q&A

Home
Integrations

Integrate with CherryStudio

CherryStudio integrates with GPUStack to leverage locally hosted LLMs, embeddings and reranking capabilities.

Deploying Models

In GPUStack UI, navigate to the Deployments page and click on Deploy Model to deploy the models you need. Here are some example models:
- qwen3-instruct-2507
- qwen2.5-vl-7b
- bge-m3
- bge-reranker-v2-m3

In the model’s Operations, open API Access Info to see how to integrate with this model:

Create an API Key

Hover over the user avatar and navigate to the API Keys page, then click on New API Key.
Fill in the name, then click Save.
Copy the API key and save it for later use.

Integrating GPUStack into CherryStudio

Open CherryStudio, go to Settings → Model Provider, find GPUStack, enable it, and configure it as shown:
- API Key: Input the API key you copied from previous steps.
- API Host: Access URL in the API Access Info panel.

In the GPUStack provider configuration, click "Manage" and enable the models you need:

(Optional) Test the API:

After configuration, return to the CherryStudio home page and start using your models.

Using LLMs

Using Multimodal Models

Select a multimodal model:

Ask multimodal questions:

Use Embeddings and Reranking to Improve Knowledge Base Q&A

Open the Knowledge Base configuration page:

Add a knowledge base:

Add content to the knowledge base (using “Notes” as an example):

Return to the home page and use knowledge base Q&A:

Integrate with RAGFlow

Integrate with OpenClaw

Copyright © 2026 GPUStack.ai