Model Management
You can manage large language models in GPUStack by navigating to the Models page. A model in GPUStack contains one or multiple replicas of model instances. On deployment, GPUStack automatically computes resource requirements for the model instances from model metadata and schedules them to available workers accordingly.
Deploy Model
-
To deploy a model, click the
Deploy Modelbutton. -
Fill in the
Nameof the model. -
Select the
Sourceof the model. Currently, models fromHugging Faceand theOllama Libraryin GGUF format are supported. -
For
Hugging Facemodels, search and fill in the Hugging Face repo ID, e.g.,microsoft/Phi-3-mini-4k-instruct-gguf, then select theFile Name, e.g.,phi-3-mini-4k-instruct-q4.gguf. ForOllama Librarymodels, select anOllama Modelfrom the dropdown list, or input any Ollama model you need, e.g.,llama3:70b. -
Adjust the
Replicasas needed. -
Click the
Savebutton.
Edit Model
- Find the model you want to edit on the model list page.
- Click the
Editbutton in theOperationscolumn. - Update the attributes as needed. For example, change the
Replicasto scale up or down. - Click the
Savebutton.
Delete Model
- Find the model you want to delete on the model list page.
- Click the ellipsis button in the
Operationscolumn, then selectDelete. - Confirm the deletion.
View Model Instance
- Find the model you want to check on the model list page.
- Click the
>symbol to view the instance list of the model.
Delete Model Instance
- Find the model you want to check on the model list page.
- Click the
>symbol to view the instance list of the model. - Find the model instance you want to delete.
- Click the ellipsis button for the model instance in the
Operationscolumn, then selectDelete. - Confirm the deletion.
Note
After a model instance is deleted, GPUStack will recreate a new instance to satisfy the expected replicas of the model if necessary.
View Model Instance Logs
- Find the model you want to check on the model list page.
- Click the
>symbol to view the instance list of the model. - Find the model instance you want to check.
- Click the
View Logsbutton for the model instance in theOperationscolumn.
Use Self-hosted Model
You can deploy self-hosted Ollama models by configuring the --ollama-library-base-url option in the GPUStack server. The Ollama Library URL should point to the base URL of the Ollama model registry. For example, https://registry.mycompany.com.
Here is an example workflow to set up a registry, publish a model, and use it in GPUStack:
# Run a self-hosted OCI registry
docker run -d -p 5001:5000 --name registry registry:2
# Push a model to the registry using Ollama
ollama pull llama3
ollama cp llama3 localhost:5001/library/llama3
ollama push localhost:5001/library/llama3 --insecure
# Start GPUStack server with the custom Ollama library URL
curl -sfL https://get.gpustack.ai | sh -s - --ollama-library-base-url http://localhost:5001
That's it! You can now deploy the model llama3 from Ollama Library source in GPUStack as usual, but the model will now be fetched from the self-hosted registry.