Integrate with RAGFlow
RAGFlow can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, Speech-to-Text and Text-to-Speech capabilities.
Deploying Models
- In GPUStack UI, navigate to the
Deploymentspage and click onDeploy Modelto deploy the models you need. Here are some example models:
- qwen3-8b
- qwen2.5-vl-3b-instruct
- bge-m3
- bge-reranker-v2-m3
- In the model’s Operations, open
API Access Infoto see how to integrate with this model.
Create an API Key
-
Hover over the user avatar and navigate to the
API Keyspage, then click onNew API Key. -
Fill in the name, then click
Save. -
Copy the API key and save it for later use.
Integrating GPUStack into RAGFlow
- Access the RAGFlow UI, go to the top right corner and click the avatar, select
Model Providers > GPUStack, then selectAdd the modeland fill in:
-
Model type: Select the model type based on the model.
-
Model name: The name must match the model name deployed on GPUStack.
-
Base URL:
http://your-gpustack-url, the URL should not include the path and do not uselocalhost, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the RAGFlow container (you can test this withcurl). -
API-Key: Input the API key you copied from previous steps.
-
Max Tokens: Input the max tokens supported by current model configuration.
Click OK to add the model:
- Add other models as needed, then select the added models in the
Set default modelsand save:
You can now use the models in the Chat and Knowledge Base, here is a simple case:
- Go to
Knowledge baseto create a new knowledge base and add your file:
- Navigate to
Retrieval testingand set the rerank model tobge-reranker-v2-m3:
- In
Chat, create an assistant, link the previously created knowledge base, and select a chat model:
- Create a chat session — you can now interact with the model and query the knowledge base:
- Edit the assistant and switch the model to
qwen2.5-vl-3b-instruct. After saving, create a new chat and upload an image to enable multimodal input:








