Integrate with Dify

Dify can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, image generation, Speech-to-Text and Text-to-Speech capabilities.

Deploying Models

In GPUStack UI, navigate to the Models page and click on Deploy Model to deploy the models you need. Here are some example models:

qwen3-8b
qwen2.5-vl-3b-instruct
bge-m3
bge-reranker-v2-m3

In the model’s Operations, open API Access Info to see how to integrate with this model.

Create an API Key

Navigate to the API Keys page and click on New API Key.
Fill in the name, then click Save.
Copy the API key and save it for later use.

Integrating GPUStack into Dify

Access the Dify UI, go to the top right corner and click on PLUGINS, select Install from Marketplace, search for the GPUStack plugin, and choose to install it.

After installed, go to Settings > Model Provider > GPUStack, then select Add Model and fill in:

Model Type: Select the model type based on the model.
Model Name: The name must match the model name deployed on GPUStack.
Server URL: http://your-gpustack-url, do not use localhost, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the Dify container (you can test this with curl).
API Key: Input the API key you copied from previous steps.

Click Save to add the model:

Add other models as needed, then select the added models in the System Model Settings and save:

You can now use the models in the Studio and Knowledge, here is a simple case:

Go to Knowledge to create a knowledge, and upload your documents:

Configure the Chunk Settings and Retrieval Settings. Use the embedding model to generate document embeddings, and the rerank model to perform retrieval ranking.

After successfully importing the documents, create an application in the Studio, add the previously created knowledge, select the chat model, and interact with it:

Switch the model to qwen2.5-vl-3b-instruct, remove the previously added knowledge base, enable Vision, and upload an image in the chat to activate multimodal input: