Skip to content

Docker Installation

You can use the official Docker image to run GPUStack in a container. Installation using docker is supported on:

Supported Platforms

  • Linux

Supported Devices

  • NVIDIA GPUs (Compute Capability 6.0 and above)
  • AMD GPUs
  • Ascend NPUs
  • Moore Threads GPUs
  • CPUs (AVX2 for x86 or NEON for ARM)

Prerequisites

Run GPUStack with Docker

Note

  1. Heterogeneous clusters are supported.

  2. You can set additional flags for the gpustack start command by appending them to the docker run command. For configuration details, please refer to the CLI Reference.

  3. You can either use the --ipc=host flag or --shm-size flag to allow the container to access the host’s shared memory. It is used by vLLM and pyTorch to share data between processes under the hood, particularly for tensor parallel inference.

  4. The -p 10150:10150 -p 40000-41024:40000-41024 -p 50000-51024:50000-51024 and --worker-ip your_host_ip flags are used to ensure that server is accessible to the worker and inference services running on it. Alternatively, you can set the --network=host and --worker-ip your_host_ip flags to instead.

NVIDIA CUDA

Prerequisites

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    --gpus all \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    --gpus all \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

AMD ROCm

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-rocm

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-rocm \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

Ascend CANN

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker ( Set ASCEND_VISIBLE_DEVICES to the required GPU indices ):

docker run -d --name gpustack \
    --restart=unless-stopped \
    -e ASCEND_VISIBLE_DEVICES=0 \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-npu

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -e ASCEND_VISIBLE_DEVICES=0 \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-npu \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

Moore Threads MUSA

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-musa

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40000-41024:40000-41024 \
    -p 50000-51024:50000-51024 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-musa \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip

Build Your Own Docker Image

For example, the official GPUStack NVIDIA CUDA image is built with CUDA 12.4. If you want to use a different version of CUDA, you can build your own Docker image.

# Example Dockerfile
ARG CUDA_VERSION=12.4.1

FROM nvidia/cuda:$CUDA_VERSION-cudnn-runtime-ubuntu22.04

ARG TARGETPLATFORM
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    git \
    curl \
    wget \
    tzdata \
    iproute2 \
    python3 \
    python3-pip \
    python3-venv \
    && rm -rf /var/lib/apt/lists/*

COPY . /workspace/gpustack
RUN cd /workspace/gpustack && \
    make build

RUN if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
    # Install vllm dependencies for x86_64
    WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[all]"; \
    else  \
    WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]"; \
    fi && \
    pip install pipx && \
    pip install $WHEEL_PACKAGE && \
    pip cache purge && \
    rm -rf /workspace/gpustack

RUN gpustack download-tools

ENTRYPOINT [ "gpustack", "start" ]

Run the following command to build the Docker image:

docker build -t my/gpustack --build-arg CUDA_VERSION=12.0.0 .

For other accelerators, refer to the corresponding Dockerfile in the GPUStack repository.