Integrate with Dify
Dify can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, image generation, Speech-to-Text and Text-to-Speech capabilities.
Deploying Models
- In GPUStack UI, navigate to the
Deployments
page and click onDeploy Model
to deploy the models you need. Here are some example models:
- qwen3-8b
- qwen2.5-vl-3b-instruct
- bge-m3
- bge-reranker-v2-m3
- In the model’s Operations, open
API Access Info
to see how to integrate with this model.
Create an API Key
-
Hover over the user avatar and navigate to the
API Keys
page, then click onNew API Key
. -
Fill in the name, then click
Save
. -
Copy the API key and save it for later use.
Integrating GPUStack into Dify
- Access the Dify UI, go to the top right corner and click on
PLUGINS
, selectInstall from Marketplace
, search for the GPUStack plugin, and choose to install it.
- After installed, go to
Settings > Model Provider > GPUStack
, then selectAdd Model
and fill in:
-
Model Type: Select the model type based on the model.
-
Model Name: The name must match the model name deployed on GPUStack.
-
Server URL:
http://your-gpustack-url
, do not uselocalhost
, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the Dify container (you can test this withcurl
). -
API Key: Input the API key you copied from previous steps.
Click Save
to add the model:
Add other models as needed, then select the added models in the System Model Settings
and save:
You can now use the models in the Studio
and Knowledge
, here is a simple case:
- Go to
Knowledge
to create a knowledge, and upload your documents:
- Configure the Chunk Settings and Retrieval Settings. Use the embedding model to generate document embeddings, and the rerank model to perform retrieval ranking.
- After successfully importing the documents, create an application in the
Studio
, add the previously created knowledge, select the chat model, and interact with it:
- Switch the model to
qwen2.5-vl-3b-instruct
, remove the previously added knowledge base, enableVision
, and upload an image in the chat to activate multimodal input:
Notes for Docker Desktop Installed Dify
If GPUStack runs on the host machine, while Dify runs inside Docker container, you must ensure proper network access between them.
Correct Configuration
Inside Dify, when adding the GPUStack model:
Field | Value |
---|---|
Server URL | http://host.docker.internal:80/v1-openai (for macOS/Windows) |
API Key | Your API Key from GPUStack |
Model Name | Must match the deployed model name in GPUStack (e.g., qwen3 ) |
Test Connectivity (inside the Dify container)
You can test the connection from inside the Dify container:
docker exec -it <dify-container-name> curl http://host.docker.internal:80/v1-openai/models
If it returns the list of models, the integration is successful.
Notes
- On macOS or Windows, use
host.docker.internal
to access services on the host machine from inside Docker. localhost
or0.0.0.0
does not work inside Docker containers unless Dify and GPUStack are running inside the same container.
Quick Reference
Scenario |
Server URL |
---|---|
Host machine deployed GPUStack (macOS/Windows) |
http://host.docker.internal:80/v1-openai |
GPUStack running in Docker | Use --network=host (Linux) or expose port to host (macOS/Windows) and use appropriate host address |
💡 If you did not specify the
--port
parameter when installing GPUStack, the default port is80
. Therefore, the Server URL should be set tohttp://host.docker.internal:80/v1-openai
.