Online Installation
Supported Devices
- NVIDIA GPUs (Compute Capability 6.0 and above, check Your GPU Compute Capability)
Supported Platforms
OS | Arch | Supported methods |
---|---|---|
Linux | AMD64 ARM64 |
Installation Script Docker Installation (Recommended) pip Installation |
Windows | AMD64 | Installation Script pip Installation |
Supported backends
- vLLM (Compute Capability 7.0 and above, only supports AMD64 Linux)
- llama-box
- vox-box
Prerequisites
- Port Requirements
- CPU support for llama-box backend: AMD64 with AVX2, or ARM64 with NEON
Check if the CPU is supported:
lscpu | grep avx2
grep -E -i "neon|asimd" /proc/cpuinfo
Windows users need to manually verify support for the above instructions.
Check if the NVIDIA driver is installed:
nvidia-smi --format=csv,noheader --query-gpu=index,name,memory.total,memory.used,utilization.gpu,temperature.gpu
And ensure the driver supports CUDA 12.4 or higher:
nvidia-smi | grep "CUDA Version"
nvidia-smi | findstr "CUDA Version"
Installation Script
Prerequites
Check if CUDA is installed and verify that its version is at least 12.4:
nvcc -V
- NVIDIA cuDNN 9 (Optional, required for audio models)
Check if cuDNN 9 is installed:
ldconfig -p | grep libcudnn
Get-ChildItem -Path C:\ -Recurse -Filter "cudnn*.dll" -ErrorAction SilentlyContinue
Run GPUStack
GPUStack provides a script to install it as a service with default port 80.
- Install Server
curl -sfL https://get.gpustack.ai | sh -s -
To configure additional environment variables and startup flags when running the script, refer to the Installation Script.
After installed, ensure that the GPUStack startup logs are normal:
tail -200f /var/log/gpustack.log
If the startup logs are normal, open http://your_host_ip
in the browser to access the GPUStack UI. Log in to GPUStack with username admin
and the default password. You can run the following command to get the password for the default setup:
cat /var/lib/gpustack/initial_admin_password
If you specify the --data-dir
parameter to set the data directory, the initial_admin_password
file will be located in the specified directory.
- (Optional) Add Worker
To add workers to the GPUStack cluster, you need to specify the server URL and authentication token when installing GPUStack on the workers.
To get the token used for adding workers, run the following command on the GPUStack server node:
cat /var/lib/gpustack/token
If you specify the --data-dir
parameter to set the data directory, the token
file will be located in the specified directory.
To install GPUStack and start it as a worker, and register it with the GPUStack server, run the following command on the worker node. Be sure to replace the URL and token with your specific values:
curl -sfL https://get.gpustack.ai | sh -s - --server-url http://your_gpustack_url --token your_gpustack_token
After installed, ensure that the GPUStack startup logs are normal:
tail -200f /var/log/gpustack.log
- Install Server
Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content
To configure additional environment variables and startup flags when running the script, refer to the Installation Script.
After installed, ensure that the GPUStack startup logs are normal:
Get-Content "$env:APPDATA\gpustack\log\gpustack.log" -Tail 200 -Wait
If the startup logs are normal, open http://your_host_ip
in the browser to access the GPUStack UI. Log in to GPUStack with username admin
and the default password. You can run the following command to get the password for the default setup:
Get-Content -Path "$env:APPDATA\gpustack\initial_admin_password" -Raw
If you specify the --data-dir
parameter to set the data directory, the initial_admin_password
file will be located in the specified directory.
- (Optional) Add Worker
To add workers to the GPUStack cluster, you need to specify the server URL and authentication token when installing GPUStack on the workers.
To get the token used for adding workers, run the following command on the GPUStack server node:
Get-Content -Path "$env:APPDATA\gpustack\token" -Raw
If you specify the --data-dir
parameter to set the data directory, the token
file will be located in the specified directory.
To install GPUStack and start it as a worker, and register it with the GPUStack server, run the following command on the worker node. Be sure to replace the URL and token with your specific values:
Invoke-Expression "& { $((Invoke-WebRequest -Uri 'https://get.gpustack.ai' -UseBasicParsing).Content) } -- --server-url http://your_gpustack_url --token your_gpustack_token"
After installed, ensure that the GPUStack startup logs are normal:
Get-Content "$env:APPDATA\gpustack\log\gpustack.log" -Tail 200 -Wait
Docker Installation
Prerequisites
Check if Docker and NVIDIA Container Toolkit are installed:
docker info | grep Runtimes | grep nvidia
Note
When systemd is used to manage the cgroups of the container and it is triggered to reload any Unit files that have references to NVIDIA GPUs (e.g. systemctl daemon-reload), containerized GPU workloads may suddenly lose access to their GPUs.
In GPUStack, GPUs may be lost in the Resources menu, and running nvidia-smi
inside the GPUStack container may result in the error: Failed to initialize NVML: Unknown Error
To prevent this issue, disabling systemd cgroup management in Docker is required.
Set the parameter "exec-opts": ["native.cgroupdriver=cgroupfs"] in the /etc/docker/daemon.json
file and restart docker, such as:
vim /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
systemctl daemon-reload && systemctl restart docker
Run GPUStack
Run the following command to start the GPUStack server and built-in worker:
docker run -d --name gpustack \
--restart=unless-stopped \
--gpus all \
--network=host \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack
docker run -d --name gpustack \
--restart=unless-stopped \
--gpus all \
-p 80:80 \
-p 10150:10150 \
-p 40064-40131:40064-40131 \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack \
--worker-ip your_host_ip
You can refer to the CLI Reference for available startup flags.
Check if the startup logs are normal:
docker logs -f gpustack
If the logs are normal, open http://your_host_ip
in the browser to access the GPUStack UI. Log in to GPUStack with username admin
and the default password. You can run the following command to get the password for the default setup:
docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password
(Optional) Add Worker
You can add more GPU nodes to GPUStack to form a GPU cluster. You need to add workers on other GPU nodes and specify the --server-url
and --token
parameters to join GPUStack.
To get the token used for adding workers, run the following command on the GPUStack server node:
docker exec -it gpustack cat /var/lib/gpustack/token
To start GPUStack as a worker, and register it with the GPUStack server, run the following command on the worker node. Be sure to replace the URL, token and node IP with your specific values:
docker run -d --name gpustack \
--restart=unless-stopped \
--gpus all \
--network=host \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack \
--server-url http://your_gpustack_url --token your_gpustack_token
docker run -d --name gpustack \
--restart=unless-stopped \
--gpus all \
-p 10150:10150 \
-p 40064-40131:40064-40131 \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack \
--server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip
Note
-
Heterogeneous cluster is supported. No matter what type of device it is, you can add it to the current GPUStack as a worker by specifying the
--server-url
and--token
parameters. -
You can set additional flags for the
gpustack start
command by appending them to the docker run command. For configuration details, please refer to the CLI Reference. -
You can either use the
--ipc=host
flag or--shm-size
flag to allow the container to access the host’s shared memory. It is used by vLLM and pyTorch to share data between processes under the hood, particularly for tensor parallel inference. -
The
-p 40064-40131:40064-40131
flag is used to ensure connectivity for distributed inference across workers. For more details, please refer to the Port Requirements. You can omit this flag if you don't need distributed inference across workers.
Build Your Own Docker Image
For example, the official GPUStack NVIDIA CUDA image is built with CUDA 12.4. If you want to use a different CUDA version, you can build your own Docker image.
# Example Dockerfile
ARG CUDA_VERSION=12.4.1
FROM nvidia/cuda:$CUDA_VERSION-cudnn-runtime-ubuntu22.04
ARG TARGETPLATFORM
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
git \
curl \
wget \
tzdata \
iproute2 \
python3 \
python3-pip \
python3-venv \
&& rm -rf /var/lib/apt/lists/*
COPY . /workspace/gpustack
RUN cd /workspace/gpustack && \
make build
RUN if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
# Install vllm dependencies for x86_64
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[all]"; \
else \
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]"; \
fi && \
pip install pipx && \
pip install $WHEEL_PACKAGE && \
pip cache purge && \
rm -rf /workspace/gpustack
RUN gpustack download-tools
ENTRYPOINT [ "gpustack", "start" ]
Run the following command to build the Docker image:
docker build -t gpustack:cuda-12.8 --build-arg CUDA_VERSION=12.8.1 .
pip Installation
Prerequisites
- Python 3.10 ~ 3.12
Check the Python version:
python -V
Check if CUDA is installed and verify that its version is at least 12.4:
nvcc -V
- NVIDIA cuDNN 9 (Optional, required for audio models)
Check if cuDNN 9 is installed:
ldconfig -p | grep libcudnn
Get-ChildItem -Path C:\ -Recurse -Filter "cudnn*.dll" -ErrorAction SilentlyContinue
Install GPUStack
Run the following to install GPUStack.
# Extra dependencies options are "vllm", "audio" and "all"
# "vllm" is only available for Linux AMD64
pip install "gpustack[all]"
pip install "gpustack[audio]"
If you don’t need the vLLM backend and support for audio models, just run:
pip install gpustack
To verify, run:
gpustack version
Run GPUStack
Run the following command to start the GPUStack server and built-in worker:
gpustack start
If the startup logs are normal, open http://your_host_ip
in the browser to access the GPUStack UI. Log in to GPUStack with username admin
and the default password. You can run the following command to get the password for the default setup:
cat /var/lib/gpustack/initial_admin_password
Get-Content -Path "$env:APPDATA\gpustack\initial_admin_password" -Raw
By default, GPUStack uses /var/lib/gpustack
as the data directory so you need sudo
or proper permission for that. You can also set a custom data directory by running:
gpustack start --data-dir mypath
You can refer to the CLI Reference for available CLI Flags.
(Optional) Add Worker
To add a worker to the GPUStack cluster, you need to specify the server URL and the authentication token.
To get the token used for adding workers, run the following command on the GPUStack server node:
cat /var/lib/gpustack/token
Get-Content -Path "$env:APPDATA\gpustack\token" -Raw
To start GPUStack as a worker, and register it with the GPUStack server, run the following command on the worker node. Be sure to replace the URL, token and node IP with your specific values:
gpustack start --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip
Run GPUStack as a System Service
A recommended way is to run GPUStack as a startup service. For example, using systemd:
Create a service file in /etc/systemd/system/gpustack.service
:
tee /etc/systemd/system/gpustack.service > /dev/null <<EOF
[Unit]
Description=GPUStack Service
Wants=network-online.target
After=network-online.target
[Service]
EnvironmentFile=-/etc/default/%N
ExecStart=$(command -v gpustack) start
Restart=always
StandardOutput=append:/var/log/gpustack.log
StandardError=append:/var/log/gpustack.log
[Install]
WantedBy=multi-user.target
EOF
Then start GPUStack:
systemctl daemon-reload && systemctl enable gpustack --now
Check the service status:
systemctl status gpustack
And ensure that the GPUStack startup logs are normal:
tail -200f /var/log/gpustack.log