Cluster Management
GPUStack supports cluster-based worker management and provides multiple cluster types. You can provision a cluster through a Cloud Provider such as DigitalOcean, or create a self-hosted cluster and add workers using Docker run commands. Alternatively, you can register all nodes in a self-hosted Kubernetes cluster as GPUStack workers.
Create Cluster
- Go to the
Clusterspage. - Click the
Add Clusterbutton. - Select a cluster provider. There are
DockerandKubernetesfor theSelf-Hostprovider andDigitalOceanfor theCloud Provider. - Depending on the provider, different options need to be set in the
Base ConfigurationandAdd Workersteps.
Create Docker Cluster
- In the
Basic Configurationstep, theNamefield is required andDescriptionis optional. - Click
Save. - In the
Add Workerstep, some options and validations are needed before adding a worker via thedocker runcommand. Select the GPU vendor. Tested vendors includeNvidia,AMD, andAscend. Experimental vendors includeHygon,Moore Threads,Iluvatar,Cambricon, andMetax. ClickNextafter selecting a vendor.Check Environment. A shell command is provided to verify your environment is ready to add a worker. Copy the script and run it in your environment. ClickNextafter the script returns OK.Specify argumentsfor the worker to be added. Provide the following arguments and clickNext:
Specify the worker IP, or let the workerAuto-detect the Worker IP. Make sure the worker IP is accessible from the server.- Specify a
Additional Volume Mountfor the worker container. The mount path can be used to reuse existing model files.
Run commandto create and start the worker container. Copy the bash script and run it in your environment.
The worker also can be added after the cluster is created.
- Go to
Clusterspage. - Find the cluster which you want to add workers.
- Click the ellipsis button in the operations column, then select
Add Worker. - Select the options to add worker. Following the same steps as above, from
Select the GPU vendortoRun command.
Register Kubernetes Cluster
- In the
Basic Configurationstep, theNamefield is required andDescriptionis optional. - Click
Save. Select the GPU vendor. Tested vendors includeNvidia,AMD, andAscend. Experimental vendors includeHygon,Moore Threads,Iluvatar,Cambricon, andMetax. ClickNextafter selecting a vendor.Check environment. A shell command is provided to verify that your environment is ready to add a worker. Copy the script and run it in your environment. ClickNextafter the script returns OK.Run commandto apply the worker manifests. Copy the bash script and run it in an environment wherekubectlis installed andkubeconfigis configured.
The kubernetes can be registerred after the cluster is created.
- Go to
Clusterspage. - Find the cluster which you want to register the Kubernetes cluster.
- Click the ellipsis button in the operations column, then select
Register Cluster. - Select the options to register cluster. Following the same steps as abovve, from
Select the GPU vendortoRun command.
Creating DigitalOcean Cluster
- In the
Basic Configurationstep, theNamefield is required andDescriptionis optional. Create or select aCloud Credentialto use to communicate to DigitalOcean API. Select theRegionfrom the regions have GPU Droplet to create. - Click
Next. - Adding one or more
Worker Pools. For each pool,Name,Instance Type,OS Image,Replicas,Batch Size,LabelsandVolumescan be specified. - Click
Saveafter the worker pools are configured.
The worker poll can be added after the cluster is created.
- Go to
Clusterspage. - Find the
DigitalOceancluster which you want to add worker pool. - Click the ellipsis button in the operations column, then select
Add Worker Pool - Adding new worker pool with options from Step 3 above.
Operating Worker Pools
You can manage worker pools for DigitalOcean clusters on the Clusters page:
- Go to the
Clusterspage. - Find the DigitalOcean cluster you want to manage and expand it to view its worker pools.
- To edit the replica count for a worker, modify it directly in the worker column.
- To edit a worker pool, click the
Editbutton and update theName,Replica,Batch Size, andLabelsas needed. - To delete a worker pool, click the ellipsis button in the operations column for the worker pool, then select
Delete.
Update Cluster
- Go to the
Clusterspage. - Find the cluster which you want to edit.
- Click the
Editbutton. - Update the
Name, andDescriptionas needed. - Click the
Savebutton.
Delete Cluster
- Go to the
Clusterspage. - Find the cluster which you want to delete.
- Click the ellipsis button in the operations column, then select
Delete. - Confirm the deletion.
- You cannot delete a cluster if there are any models or workers still present in it.