Integrate with RAGFlow
RAGFlow can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, Speech-to-Text and Text-to-Speech capabilities.
Deploying Models
- In GPUStack UI, navigate to the
Models
page and click onDeploy Model
to deploy the models you need. Here are some example models:
- qwen3-8b
- qwen2.5-vl-3b-instruct
- bge-m3
- bge-reranker-v2-m3
- In the model’s Operations, open
API Access Info
to see how to integrate with this model.
Create an API Key
-
Navigate to the
API Keys
page and click onNew API Key
. -
Fill in the name, then click
Save
. -
Copy the API key and save it for later use.
Integrating GPUStack into RAGFlow
- Access the RAGFlow UI, go to the top right corner and click the avatar, select
Model Providers > GPUStack
, then selectAdd the model
and fill in:
-
Model type: Select the model type based on the model.
-
Model name: The name must match the model name deployed on GPUStack.
-
Base URL:
http://your-gpustack-url
, the URL should not include the path and do not uselocalhost
, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the RAGFlow container (you can test this withcurl
). -
API-Key: Input the API key you copied from previous steps.
-
Max Tokens: Input the max tokens supported by current model configuration.
Click OK
to add the model:
- Add other models as needed, then select the added models in the
Set default models
and save:
You can now use the models in the Chat
and Knowledge Base
, here is a simple case:
- Go to
Knowledge base
to create a new knowledge base and add your file:
- Navigate to
Retrieval testing
and set the rerank model tobge-reranker-v2-m3
:
- In
Chat
, create an assistant, link the previously created knowledge base, and select a chat model:
- Create a chat session — you can now interact with the model and query the knowledge base:
- Edit the assistant and switch the model to
qwen2.5-vl-3b-instruct
. After saving, create a new chat and upload an image to enable multimodal input: