Integrate with RAGFlow

RAGFlow can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, Speech-to-Text and Text-to-Speech capabilities.

Deploying Models

In GPUStack UI, navigate to the Deployments page and click on Deploy Model to deploy the models you need. Here are some example models:

In the model’s Operations, open API Access Info to see how to integrate with this model.

Hover over the user avatar and navigate to the API Keys page, then click on New API Key.
Fill in the name, then click Save.
Copy the API key and save it for later use.

Access the RAGFlow UI, go to the top right corner and click the avatar, select Model Providers > GPUStack, then select Add the model and fill in:

Model type: Select the model type based on the model.
Model name: The name must match the model name deployed on GPUStack.
Base URL: http://your-gpustack-url, the URL should not include the path and do not use localhost, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the RAGFlow container (you can test this with curl).
API-Key: Input the API key you copied from previous steps.
Max Tokens: Input the max tokens supported by current model configuration.

Click OK to add the model:

Add other models as needed, then select the added models in the Set default models and save:

You can now use the models in the Chat and Knowledge Base, here is a simple case:

Navigate to Retrieval testing and set the rerank model to bge-reranker-v2-m3:

In Chat, create an assistant, link the previously created knowledge base, and select a chat model:

Create a chat session — you can now interact with the model and query the knowledge base:

Edit the assistant and switch the model to qwen2.5-vl-3b-instruct. After saving, create a new chat and upload an image to enable multimodal input: