Model Management
You can manage large language models in GPUStack by navigating to the Models
page. A model in GPUStack contains one or multiple replicas of model instances. On deployment, GPUStack automatically computes resource requirements for the model instances from model metadata and schedules them to available workers accordingly.
Deploy Model
-
To deploy a model, click the
Deploy Model
button. -
Fill in the
Name
of the model. -
Select the
Source
of the model. Currently, models fromHugging Face
and theOllama Library
in GGUF format are supported. -
For
Hugging Face
models, search and fill in the Hugging Face repo ID, e.g.,microsoft/Phi-3-mini-4k-instruct-gguf
, then select theFile Name
, e.g.,phi-3-mini-4k-instruct-q4.gguf
. ForOllama Library
models, select anOllama Model
from the dropdown list, or input any Ollama model you need, e.g.,llama3:70b
. -
Adjust the
Replicas
as needed. -
Click the
Save
button.
Edit Model
- Find the model you want to edit on the model list page.
- Click the
Edit
button in theOperations
column. - Update the attributes as needed. For example, change the
Replicas
to scale up or down. - Click the
Save
button.
Delete Model
- Find the model you want to delete on the model list page.
- Click the ellipsis button in the
Operations
column, then selectDelete
. - Confirm the deletion.
View Model Instance
- Find the model you want to check on the model list page.
- Click the
>
symbol to view the instance list of the model.
Delete Model Instance
- Find the model you want to check on the model list page.
- Click the
>
symbol to view the instance list of the model. - Find the model instance you want to delete.
- Click the ellipsis button for the model instance in the
Operations
column, then selectDelete
. - Confirm the deletion.
Note
After a model instance is deleted, GPUStack will recreate a new instance to satisfy the expected replicas of the model if necessary.
View Model Instance Logs
- Find the model you want to check on the model list page.
- Click the
>
symbol to view the instance list of the model. - Find the model instance you want to check.
- Click the
View Logs
button for the model instance in theOperations
column.
Use Self-hosted Model
You can deploy self-hosted Ollama models by configuring the --ollama-library-base-url
option in the GPUStack server. The Ollama Library
URL should point to the base URL of the Ollama model registry. For example, https://registry.mycompany.com
.
Here is an example workflow to set up a registry, publish a model, and use it in GPUStack:
# Run a self-hosted OCI registry
docker run -d -p 5001:5000 --name registry registry:2
# Push a model to the registry using Ollama
ollama pull llama3
ollama cp llama3 localhost:5001/library/llama3
ollama push localhost:5001/library/llama3 --insecure
# Start GPUStack server with the custom Ollama library URL
curl -sfL https://get.gpustack.ai | sh -s - --ollama-library-base-url http://localhost:5001
That's it! You can now deploy the model llama3
from Ollama Library
source in GPUStack as usual, but the model will now be fetched from the self-hosted registry.