Setting Up a Multi-node GPUStack Cluster

This tutorial will guide you through setting up a multi-node GPUStack cluster, where you can distribute your workloads across multiple GPU-enabled nodes. This guide assumes you have basic knowledge of running commands on Linux, MacOS, or Windows systems.

Prerequisites

Before starting, ensure you have the following:

Multiple nodes with supported OS and GPUs for GPUStack installation. View supported platforms and supported accelerators for more information.
Nodes are connected to the same network and can communicate with each other.

Step 1: Install GPUStack on the Server Node

First, you need to install GPUStack on one of the nodes to act as the server node. Follow the instructions below based on your operating system.

Linux or MacOS

Run the following command on your server node:

curl -sfL https://get.gpustack.ai | sh -s -

Windows

Run PowerShell as administrator and execute the following command:

Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content

Once GPUStack is installed, you can proceed to configure your cluster by adding worker nodes.

Step 2: Retrieve the Token from the Server Node

To add worker nodes to the cluster, you need the token generated by GPUStack on the server node. On the server node, run the following command to get the token:

Linux or MacOS

cat /var/lib/gpustack/token

Windows

Get-Content -Path "$env:APPDATA\gpustack\token" -Raw

This token will be required in the next steps to authenticate worker nodes.

Step 3: Add Worker Nodes to the Cluster

Now, you will install GPUStack on additional nodes (worker nodes) and connect them to the server node using the token.

Linux or MacOS Worker Nodes

Run the following command on each worker node, replacing http://myserver with the URL of your server node and mytoken with the token retrieved in Step 2:

curl -sfL https://get.gpustack.ai | sh -s - --server-url http://myserver --token mytoken

Windows Worker Nodes

Run PowerShell as administrator on each worker node and use the following command, replacing http://myserver and mytoken with the server URL and token:

Invoke-Expression "& { $((Invoke-WebRequest -Uri 'https://get.gpustack.ai' -UseBasicParsing).Content) } --server-url http://myserver --token mytoken"

Once the command is executed, each worker node will connect to the main server and become part of the GPUStack cluster.

Step 4: Verify the Cluster Setup

After adding the worker nodes, you can verify that the cluster is set up correctly by accessing the GPUStack UI.

Open a browser and navigate to http://myserver (replace myserver with the actual server URL).
Log in with the default credentials (username admin). To retrieve the default password, run the following command on the server node:

Linux or MacOS

cat /var/lib/gpustack/initial_admin_password

Windows

Get-Content -Path "$env:APPDATA\gpustack\initial_admin_password" -Raw

After logging in, navigate to the Resources page in the UI to see all connected nodes and their GPUs. You should see your worker nodes listed and ready for serving LLMs.

Conclusion

Congratulations! You've successfully set up a multi-node GPUStack cluster! You can now scale your workloads across multiple nodes, making full use of your available GPUs to handle your tasks efficiently.