Docker Installation

You can use the official Docker image to run GPUStack in a container. Installation using docker is supported on:

Supported Platforms

Linux

Supported Devices

Prerequisites

Docker

Run GPUStack with Docker

Note

Heterogeneous clusters are supported.
You can set additional flags for the gpustack start command by appending them to the docker run command. For configuration details, please refer to the CLI Reference.
You can either use the --ipc=host flag or --shm-size flag to allow the container to access the host’s shared memory. It is used by vLLM and pyTorch to share data between processes under the hood, particularly for tensor parallel inference.
The -p 10150:10150 -p 40000-41024:40000-41024 -p 50000-51024:50000-51024 and --worker-ip your_host_ip flags are used to ensure that server is accessible to the worker and inference services running on it. Alternatively, you can set the --network=host and --worker-ip your_host_ip flags to instead.

NVIDIA CUDA

Prerequisites

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    --gpus all \
    --network=host \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack

or

docker run -d --name gpustack \
    --restart=unless-stopped \
    --gpus all \
    -p 80:80 \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack --worker-ip your_host_ip

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    --gpus all \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

AMD ROCm

Prerequisites

AMDGPU driver and ROCm

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-rocm

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-rocm \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

Ascend CANN

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker ( Set ASCEND_VISIBLE_DEVICES to the required GPU indices ):

docker run -d --name gpustack \
    --restart=unless-stopped \
    -e ASCEND_VISIBLE_DEVICES=0 \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-npu

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -e ASCEND_VISIBLE_DEVICES=0 \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-npu \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

Moore Threads MUSA

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-musa

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-musa \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

Build Your Own Docker Image

For example, the official GPUStack NVIDIA CUDA image is built with CUDA 12.4. If you want to use a different version of CUDA, you can build your own Docker image.

# Example Dockerfile
ARG CUDA_VERSION=12.4.1

FROM nvidia/cuda:$CUDA_VERSION-cudnn-runtime-ubuntu22.04

ARG TARGETPLATFORM
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    git \
    curl \
    wget \
    tzdata \
    iproute2 \
    python3 \
    python3-pip \
    python3-venv \
    && rm -rf /var/lib/apt/lists/*

COPY . /workspace/gpustack
RUN cd /workspace/gpustack && \
    make build

RUN if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
    # Install vllm dependencies for x86_64
    WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[all]"; \
    else  \
    WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]"; \
    fi && \
    pip install pipx && \
    pip install $WHEEL_PACKAGE && \
    pip cache purge && \
    rm -rf /workspace/gpustack

RUN gpustack download-tools

ENTRYPOINT [ "gpustack", "start" ]

Run the following command to build the Docker image:

docker build -t my/gpustack --build-arg CUDA_VERSION=12.0.0 .

For other accelerators, refer to the corresponding Dockerfile in the GPUStack repository.