Docker Installation
You can use the official Docker image to run GPUStack in a container. Installation using docker is supported on:
Supported Platforms
- Linux
Supported Devices
- NVIDIA GPUs (Compute Capability 6.0 and above)
- AMD GPUs
- Ascend NPUs
- Moore Threads GPUs
- CPUs (AVX2 for x86 or NEON for ARM)
Prerequisites
Run GPUStack with Docker
Note
-
Heterogeneous clusters are supported.
-
You can set additional flags for the
gpustack start
command by appending them to the docker run command. For configuration details, please refer to the CLI Reference. -
You can either use the
--ipc=host
flag or--shm-size
flag to allow the container to access the host’s shared memory. It is used by vLLM and pyTorch to share data between processes under the hood, particularly for tensor parallel inference. -
The
-p 10150:10150 -p 40000-41024:40000-41024 -p 50000-51024:50000-51024
and--worker-ip your_host_ip
flags are used to ensure that server is accessible to the worker and inference services running on it. Alternatively, you can set the--network=host
and--worker-ip your_host_ip
flags to instead.
NVIDIA CUDA
Prerequisites
Run GPUStack
Run the following command to start the GPUStack server and built-in worker:
docker run -d --name gpustack \
--restart=unless-stopped \
--gpus all \
-p 80:80 \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack
(Optional) Run the following command to start the GPUStack server without built-in worker:
docker run -d --name gpustack-server \
--restart=unless-stopped \
-p 80:80 \
-v gpustack-server-data:/var/lib/gpustack \
gpustack/gpustack:latest-cpu \
--disable-worker
(Optional) Add Worker
To retrieve the token, run the following command on the GPUStack server host:
docker exec -it gpustack-server cat /var/lib/gpustack/token
To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:
docker run -d --name gpustack-worker \
--restart=unless-stopped \
--gpus all \
-p 10150:10150 \
-p 40000-41024:40000-41024 \
-p 50000-51024:50000-51024 \
--ipc=host \
-v gpustack-worker-data:/var/lib/gpustack \
gpustack/gpustack \
--server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip
AMD ROCm
Prerequisites
Refer to this Tutorial.
Run GPUStack
Run the following command to start the GPUStack server and built-in worker:
docker run -d --name gpustack \
--restart=unless-stopped \
-p 80:80 \
--ipc=host \
--group-add=video \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack:latest-rocm
(Optional) Run the following command to start the GPUStack server without built-in worker:
docker run -d --name gpustack-server \
--restart=unless-stopped \
-p 80:80 \
-v gpustack-server-data:/var/lib/gpustack \
gpustack/gpustack:latest-cpu \
--disable-worker
(Optional) Add Worker
To retrieve the token, run the following command on the GPUStack server host:
docker exec -it gpustack-server cat /var/lib/gpustack/token
To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:
docker run -d --name gpustack-worker \
--restart=unless-stopped \
-p 10150:10150 \
-p 40000-41024:40000-41024 \
-p 50000-51024:50000-51024 \
--ipc=host \
--group-add=video \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v gpustack-worker-data:/var/lib/gpustack \
gpustack/gpustack:latest-rocm \
--server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip
Ascend CANN
Prerequisites
Refer to this Tutorial.
Run GPUStack
Run the following command to start the GPUStack server and built-in worker ( Set ASCEND_VISIBLE_DEVICES
to the required GPU indices ):
docker run -d --name gpustack \
--restart=unless-stopped \
-e ASCEND_VISIBLE_DEVICES=0 \
-p 80:80 \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack:latest-npu
(Optional) Run the following command to start the GPUStack server without built-in worker:
docker run -d --name gpustack-server \
--restart=unless-stopped \
-p 80:80 \
-v gpustack-server-data:/var/lib/gpustack \
gpustack/gpustack:latest-cpu \
--disable-worker
(Optional) Add Worker
To retrieve the token, run the following command on the GPUStack server host:
docker exec -it gpustack-server cat /var/lib/gpustack/token
To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:
docker run -d --name gpustack-worker \
--restart=unless-stopped \
-e ASCEND_VISIBLE_DEVICES=0 \
-p 10150:10150 \
-p 40000-41024:40000-41024 \
-p 50000-51024:50000-51024 \
--ipc=host \
-v gpustack-worker-data:/var/lib/gpustack \
gpustack/gpustack:latest-npu \
--server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip
Moore Threads MUSA
Prerequisites
Refer to this Tutorial.
Run GPUStack
Run the following command to start the GPUStack server and built-in worker:
docker run -d --name gpustack \
--restart=unless-stopped \
-p 80:80 \
--ipc=host \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack:latest-musa
(Optional) Run the following command to start the GPUStack server without built-in worker:
docker run -d --name gpustack-server \
--restart=unless-stopped \
-p 80:80 \
-v gpustack-server-data:/var/lib/gpustack \
gpustack/gpustack:latest-cpu \
--disable-worker
(Optional) Add Worker
To retrieve the token, run the following command on the GPUStack server host:
docker exec -it gpustack-server cat /var/lib/gpustack/token
To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:
docker run -d --name gpustack-worker \
--restart=unless-stopped \
-p 10150:10150 \
-p 40000-41024:40000-41024 \
-p 50000-51024:50000-51024 \
--ipc=host \
-v gpustack-worker-data:/var/lib/gpustack \
gpustack/gpustack:latest-musa \
--server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_host_ip
Build Your Own Docker Image
For example, the official GPUStack NVIDIA CUDA image is built with CUDA 12.4. If you want to use a different version of CUDA, you can build your own Docker image.
# Example Dockerfile
ARG CUDA_VERSION=12.4.1
FROM nvidia/cuda:$CUDA_VERSION-cudnn-runtime-ubuntu22.04
ARG TARGETPLATFORM
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
git \
curl \
wget \
tzdata \
iproute2 \
python3 \
python3-pip \
python3-venv \
&& rm -rf /var/lib/apt/lists/*
COPY . /workspace/gpustack
RUN cd /workspace/gpustack && \
make build
RUN if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
# Install vllm dependencies for x86_64
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[all]"; \
else \
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]"; \
fi && \
pip install pipx && \
pip install $WHEEL_PACKAGE && \
pip cache purge && \
rm -rf /workspace/gpustack
RUN gpustack download-tools
ENTRYPOINT [ "gpustack", "start" ]
Run the following command to build the Docker image:
docker build -t my/gpustack --build-arg CUDA_VERSION=12.0.0 .
For other accelerators, refer to the corresponding Dockerfile in the GPUStack repository.