Integrate with MaxKB

MaxKB can integrate with GPUStack to leverage locally deployed LLMs, embedding models, and reranking models for building knowledge-based AI assistants.

Deploying Models

In GPUStack UI, navigate to the Deployments page and click on Deploy Model to deploy the models you need. Here are some example models:

qwen3.5-35b-a3b

qwen3-embedding-4b

qwen3-reranker-4b

After deployment, you can test the model in Playground.

Obtain Model Access Information

In the GPUStack sidebar, open the Routes page.
Click the More actions menu next to the route and select API Access Info.

Record the following information:

1
2
3

Base URL
Model Name
API Key

Example:

Base URL: http://your-gpustack-url/v1

Model Name:
qwen3.5-35b-a3b
qwen3-embedding-4b
qwen3-reranker-4b

API Key:
gpustack_xxxxxxxxxxxxx

Note

You can create an API Key following the instructions in the UI.

Deploy MaxKB

MaxKB can be deployed using Docker:

docker run -d \
  --name maxkb \
  --restart always \
  -p 8080:8080 \
  -v ~/.maxkb:/opt/maxkb \
  1panel/maxkb

Default credentials:

1	`admin / MaxKB@123..`

After logging in for the first time, follow the prompt to change the password.

Integrating GPUStack into MaxKB

In the MaxKB UI, navigate to Model in the top navigation bar.

Click Add Model and configure the model.

When configuring the model:

Base Model: Must match the model name deployed in GPUStack.
API URL: http://your-gpustack-url/v1
API Key: The API key created in GPUStack.

Note

API URL and API Key fields will appear after entering the Base Model and pressing Enter.

Add the embedding and reranking models using the same method:

qwen3-embedding-4b

qwen3-reranker-4b

For qwen3-reranker-4b, enable Generic Proxy.

This is required because MaxKB uses the following endpoint:

1	`/v2/rerank`

After configuration, the models should appear in the model list.

Create a Knowledge Base

Navigate to the Knowledge page.
Click Create and select Web Knowledge.

Enter a documentation URL or other data source. MaxKB will automatically crawl and parse the content.

After the crawl is completed:

Create an AI Agent

Go to the Agent page.
Click Create to create a new agent.

Configure the agent with:

Chat model
Knowledge base
Retrieval settings

Click Publish to activate the agent.

Chat with the Knowledge Base

Open the chat interface to start interacting with the assistant.

The assistant can now answer questions based on the connected knowledge base and models deployed on GPUStack.