Running Inference With AMD GPUs
GPUStack supports running inference on AMD GPUs. This tutorial will guide you through the configuration steps.
Docker Installation
System and Hardware Support
OS | Architecture | Status | Verified |
---|---|---|---|
Linux | x86_64 | Support | Ubuntu 20.04/22.04 |
Device | Supported Backends | Verified |
---|---|---|
gfx1101: AMD Radeon RX 7800 | llama-box, vLLM | Yes |
gfx1100: AMD Radeon RX 7900/7700/7600 | llama-box, vLLM | |
gfx90a: AMD Instinct accelerators MI250X/MI250/MI210/MI200s | llama-box, vLLM | |
gfx942: AMD Instinct accelerators MI325X/MI300X/MI300A | llama-box, vLLM | |
gfx1030: AMD Radeon RX 6950 XT/6900 XT/6800 XT/6800 | llama-box | |
gfx908: AMD Instinct accelerators MI100 | llama-box | |
gfx906: AMD Instinct accelerators MI60/MI50 | llama-box |
Setup Instructions
Install ROCm
Select the appropriate installation method for your system. Here, we provide steps for Linux (Ubuntu). For other systems, refer to the ROCm documentation:
- Install Required Packages
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/jammy/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
amdgpu-install -y --usecase=graphics,rocm
sudo reboot
- Set Groups permissions
sudo usermod -a -G render,video $LOGNAME
sudo reboot
- Verify Installation
# Verify that the current user is added to the render and video groups.
# Expected result: <username> adm cdrom sudo dip video plugdev render lpadmin lxd sambashare
groups
# Check if amdgpu kernel driver is installed.
# Exptected result: amdgpu/x.x.x-xxxxxxx.xx.xx, x.x.x-xx-generic, x86_64: installed
dkms status
# Check if the GPU is listed as an agent.
rocminfo
# Check rocm-smi.
rocm-smi -i --showmeminfo vram --showpower --showserial --showuse --showtemp --showproductname
Configure the Container Runtime
Follow the Docker Installation Guide to install and configure the container runtime.
Installing GPUStack
To set up an isolated environment for GPUStack, we recommend using Docker.
docker run -it \
--network=host \
--ipc=host \
--group-add=video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
gpustack/gpustack:v0.5.0-rocm
If the following message appears, the GPUStack container is running successfully:
2024-11-15T23:37:46+00:00 - gpustack.server.server - INFO - Serving on 0.0.0.0:80.
2024-11-15T23:37:46+00:00 - gpustack.worker.worker - INFO - Starting GPUStack worker.
Once the container is running, access the GPUStack web interface by navigating to http://localhost:80
in your browser, you should see that GPUStack successfully recognizes the AMD Device in the resources page.
Running Inference
After installation, you can deploy models and run inference. Refer to the model management for detailed usage instructions.
non-Docker Installation
System and Hardware Support
OS | Architecture | Status | Verified |
---|---|---|---|
Linux | x86_64 | Support | Ubuntu 20.04/22.04 |
Windows | x86_64 | Support | Windows 11 |
Device | Supported Backends | Verified |
---|---|---|
gfx1101: AMD Radeon RX 7800 | llama-box, vLLM(Linux Only) | Yes |
gfx1100: AMD Radeon RX 7900/7700/7600 | llama-box, vLLM(Linux Only) | |
gfx90a: AMD Instinct accelerators MI250X/MI250/MI210/MI200s | llama-box(Linux Only), vLLM(Linux Only) | |
gfx942: AMD Instinct accelerators MI325X/MI300X/MI300A | llama-box(Linux Only), vLLM(Linux Only) | |
gfx1030: AMD Radeon RX 6950 XT/6900 XT/6800 XT/6800 | llama-box | |
gfx908: AMD Instinct accelerators MI100 | llama-box(Linux Only) | |
gfx906: AMD Instinct accelerators MI60/MI50 | llama-box(Linux Only) |
Setup Instructions
Install ROCm
Select the appropriate installation method for your system. The Linux (Ubuntu) could follow the same instrustion as Above Docker Installation
. For other systems, refer to the ROCm documentation:
Installing GPUStack
Once the environment is set up, install GPUStack following the installation guide.
After installation, GPUStack will detect AMD GPUs automatically.
Example:
Running Inference
After installation, you can deploy models and run inference. Refer to the model management for usage details.
Note
vllm backend is not supported in non-Docker deployment.