Online Installation

Supported Devices

Cambricon MLUs

Supported Platforms

OS	Arch	Supported methods
Linux	AMD64	pip Installation

Supported backends

vLLM

Prerequisites

Cambricon Driver

Check if the Cambricon driver is installed:

cnmon

Cambricon Pytorch docker image

Please contact Cambricon engineers to get the Cambricon Pytorch docker image.

pip Installation

Use Cambricon Pytorch docker image and Activate the pytorch_infer virtual environment:

source /torch/venv3/pytorch_infer/bin/activate

Install GPUStack

Run the following to install GPUStack.

# vLLM has been installed in Cambricon Pytorch docker
pip install "gpustack[audio]"

To verify, run:

gpustack version

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

gpustack start

If the startup logs are normal, open http://your_host_ip in the browser to access the GPUStack UI. Log in to GPUStack with username admin and the default password. You can run the following command to get the password for the default setup:

cat /var/lib/gpustack/initial_admin_password

By default, GPUStack uses /var/lib/gpustack as the data directory so you need sudo or proper permission for that. You can also set a custom data directory by running:

gpustack start --data-dir mypath

You can refer to the CLI Reference for available CLI Flags.

(Optional) Add Worker

To add a worker to the GPUStack cluster, you need to specify the server URL and the authentication token.

To get the token used for adding workers, run the following command on the GPUStack server node:

cat /var/lib/gpustack/token

To start GPUStack as a worker, and register it with the GPUStack server, run the following command on the worker node. Be sure to replace the URL, token and node IP with your specific values:

gpustack start --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip