Integrate with Dify
Dify can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, image generation, Speech-to-Text and Text-to-Speech capabilities.
Deploying Models
- In GPUStack UI, navigate to the
Modelspage and click onDeploy Modelto deploy the models you need. Here are some example models:
- qwen3-8b
- qwen2.5-vl-3b-instruct
- bge-m3
- bge-reranker-v2-m3
- In the model’s Operations, open
API Access Infoto see how to integrate with this model.
Create an API Key
-
Navigate to the
API Keyspage and click onNew API Key. -
Fill in the name, then click
Save. -
Copy the API key and save it for later use.
Integrating GPUStack into Dify
- Access the Dify UI, go to the top right corner and click on
PLUGINS, selectInstall from Marketplace, search for the GPUStack plugin, and choose to install it.
- After installed, go to
Settings > Model Provider > GPUStack, then selectAdd Modeland fill in:
-
Model Type: Select the model type based on the model.
-
Model Name: The name must match the model name deployed on GPUStack.
-
Server URL:
http://your-gpustack-url, do not uselocalhost, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the Dify container (you can test this withcurl). -
API Key: Input the API key you copied from previous steps.
Click Save to add the model:
Add other models as needed, then select the added models in the System Model Settings and save:
You can now use the models in the Studio and Knowledge, here is a simple case:
- Go to
Knowledgeto create a knowledge, and upload your documents:
- Configure the Chunk Settings and Retrieval Settings. Use the embedding model to generate document embeddings, and the rerank model to perform retrieval ranking.
- After successfully importing the documents, create an application in the
Studio, add the previously created knowledge, select the chat model, and interact with it:
- Switch the model to
qwen2.5-vl-3b-instruct, remove the previously added knowledge base, enableVision, and upload an image in the chat to activate multimodal input:








