Environment Variables
GPUStack supports various environment variables for configuration.
Most command line parameters can also be set via environment variables with the GPUSTACK_ prefix and in uppercase format (e.g., --data-dir can be set via GPUSTACK_DATA_DIR).
For a complete list of command line parameters that can be set as environment variables, see CLI Reference. This will not be discussed here.
Priority Order
Configuration values are applied in the following priority order (highest to lowest):
- Command line arguments
- Environment variables
- Configuration file
- Default values
This means that command line arguments will always override environment variables, and environment variables will override values in the configuration file.
GPUStack Core Environment Variables
These environment variables are typically used for third-party service integrations.
The Applies to column indicates where the environment variable should be set:
- Server - Applies to the GPUStack server.
- Worker - Applies to GPUStack workers.
- Model - Applies to model deployment configurations.
Proxy Configuration
Variable |
Description | Default | Applies to |
|---|---|---|---|
HTTP_PROXY |
HTTP proxy URL. e.g., http://proxy-server:port |
(empty) | Server & Worker |
HTTPS_PROXY |
HTTPS proxy URL. e.g., https://proxy-server:port |
(empty) | Server & Worker |
NO_PROXY |
Comma-separated list of hosts to exclude. e.g., 127.0.0.1,10.0.0.0/8,192.168.0.0/16,172.16.0.0/16,localhost,cluster.local |
(empty) | Server & Worker |
Hugging Face Hub
Variable |
Description | Default | Applies to |
|---|---|---|---|
HF_ENDPOINT |
Hugging Face Hub endpoint. e.g., https://hf-mirror.com |
(empty) | Server & Worker |
HF_TOKEN |
Hugging Face Hub access token. | (empty) | Server & Worker |
Database Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_DB_ECHO |
Enable database query logging. | false |
Server |
GPUSTACK_DB_POOL_SIZE |
Database connection pool size. | 30 |
Server |
GPUSTACK_DB_MAX_OVERFLOW |
Database connection pool max overflow. | 20 |
Server |
GPUSTACK_DB_POOL_TIMEOUT |
Database connection pool timeout in seconds. | 30 |
Server |
Network Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_PROXY_TIMEOUT_SECONDS |
Proxy timeout in seconds. | 1800 |
Server |
GPUSTACK_PROXY_UPSTREAM_IDLE_TIMEOUT_SECONDS |
Upstream idle timeout in seconds for higress | 3 |
Server |
GPUSTACK_TCP_CONNECTOR_LIMIT |
HTTP client TCP connector limit. | 1000 |
Server & Worker |
Server Cache Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_SERVER_CACHE_TTL_SECONDS |
Server cache TTL in seconds. | 600 |
Server |
Authentication & Security
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_JWT_TOKEN_EXPIRE_MINUTES |
JWT token expiration time in minutes. | 120 |
Server |
Gateway Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_HIGRESS_EXT_AUTH_TIMEOUT_MS |
Higress external authentication timeout in milliseconds. | 30000 |
Server |
GPUSTACK_GATEWAY_PORT_CHECK_INTERVAL |
The interval in seconds of GPUStack Server checking embedded gateway listening port | 2 |
Server |
GPUSTACK_GATEWAY_PORT_CHECK_RETRY_COUNT |
The retry count of GPUStack Server checking embedded gateway listening port | 300 |
Server |
GPUSTACK_GATEWAY_EXTERNAL_METRICS_URL |
The external gateway metrics url. e.g. http://<gateway-ip>:15020/stats/prometheus |
None | Server |
GPUSTACK_GATEWAY_AI_STATISTICS_PLUGIN_CONTENT_TYPES |
Comma-separated list of content-types to be monitored by the ai-statistics plugin. Each value should be a valid HTTP Content-Type. | application/json,text/event-stream |
Server |
Cluster Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_DEFAULT_CLUSTER_KUBERNETES |
If a default cluster is created automatically, it will use the Kubernetes provider when this variable is set. |
false |
Server |
Worker and Model Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_WORKER_HEARTBEAT_INTERVAL |
Worker heartbeat interval in seconds. | 30 |
Worker |
GPUSTACK_WORKER_STATUS_SYNC_INTERVAL |
Worker status synchronization interval in seconds. | 30 |
Worker |
GPUSTACK_WORKER_UNREACHABLE_CHECK_MODE |
Worker unreachable check mode. Options: auto, enabled, disabled. auto disables check when worker count > 50. |
auto |
Server |
GPUSTACK_WORKER_HEARTBEAT_GRACE_PERIOD |
Worker heartbeat grace period in seconds. | 150 |
Server |
GPUSTACK_MODEL_INSTANCE_RESCHEDULE_GRACE_PERIOD |
Model instance reschedule grace period in seconds. | 300 |
Server |
GPUSTACK_MODEL_EVALUATION_CACHE_MAX_SIZE |
Maximum size of model evaluation cache. | 1000 |
Server |
GPUSTACK_MODEL_EVALUATION_CACHE_TTL |
TTL of model evaluation cache in seconds. | 3600 |
Server |
GPUSTACK_WORKER_ORPHAN_WORKLOAD_CLEANUP_GRACE_PERIOD |
Worker orphan workload cleanup grace period in seconds. | 300 |
Worker |
GPUSTACK_WORKER_ORPHAN_BENCHMARK_WORKLOAD_CLEANUP_GRACE_PERIOD |
Worker orphan benchmark workload cleanup grace period in seconds. | 300 |
Worker |
GPUSTACK_WORKER_STATUS_COLLECTION_LOG_SLOW_SECONDS |
Add debug log for slow worker status collection if it exceeds this time in seconds. | 180 |
Worker |
GPUSTACK_MODEL_INSTANCE_HEALTH_CHECK_INTERVAL |
Model instance health check interval in seconds. | 3 |
Worker |
GPUSTACK_DISABLE_OS_FILELOCK |
Disable OS file lock. | false |
Worker |
Benchmark Configuration
| Variable | Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_BENCHMARK_DATASET_SHAREGPT_PATH |
ShareGPT dataset path used by the benchmark container when dataset_name is set to ShareGPT. The default image already includes this dataset. |
/workspace/benchmark-runner/sharegpt_data/ShareGPT_V3_unfiltered_cleaned_split.json |
Worker |
GPUSTACK_BENCHMARK_REQUEST_TIMEOUT |
Timeout for each benchmark request in seconds. | 3600 |
Worker |
Model Deployment Configuration
Note
These environment variables are not set when starting GPUStack. Instead, they should be configured in the Advanced Options > Environment Variables section when deploying a model. They are used to customize the model serving behavior.
Variable |
Description | Default | Applies to |
|---|---|---|---|
GPUSTACK_MODEL_SERVING_COMMAND_SCRIPT_DISABLED |
Disable the automatic serving command script execution. When set to 1 or true, the script that handles package installation and other setup tasks will not run. |
0 |
Model |
PYPI_PACKAGES_INSTALL |
Additional PyPI packages to install in the model serving environment. Multiple packages should be space-separated. The script will use uv pip install if available, otherwise pip install. |
(empty) | Model |
GPUSTACK_MODEL_RAM_CLAIM |
User-declared RAM requirement (in Byte) for the model, used by the scheduler for capacity planning. | (empty) | Model |
GPUSTACK_MODEL_VRAM_CLAIM |
User-declared VRAM requirement (in Byte) for the model, used by the scheduler for capacity planning. | (empty) | Model |
GPUSTACK_APPLY_QWEN3_RERANKER_TEMPLATES |
Apply Qwen3 reranker templates to the request body. See instructions in https://huggingface.co/Qwen/Qwen3-Reranker-0.6B. | (empty) | Model |
GPUSTACK_SKIP_MODEL_EVALUATION |
Skips the model evaluation or validation step during deployment. | (empty) | Model |
GPUSTACK_DISABLE_METRICS |
Disables metric expose and collection for the model. | (empty) | Model |
GPUSTACK_MODEL_HEALTH_CHECK_PATH |
Specifies the HTTP health check path exposed by the model. | (empty) | Model |
GPUSTACK_MODEL_RUNTIME_UID |
Control the user permissions of processes running inside the container. | (empty) | Model |
GPUSTACK_MODEL_RUNTIME_GID |
Control the group permissions of processes running inside the container. | (empty) | Model |
GPUSTACK_MODEL_RUNTIME_SHM_SIZE_GIB |
Shared memory size for the container in GiB. | 10.0 |
Model |
Usage Example
When deploying a model, navigate to Advanced Options > Environment Variables and add:
# Install additional packages before model serving starts
PYPI_PACKAGES_INSTALL=torch-audio==2.0.0 transformers==4.30.0
# Disable the serving command script entirely
GPUSTACK_MODEL_SERVING_COMMAND_SCRIPT_DISABLED=1
The serving command script automatically handles:
- Installing additional PyPI packages specified in
PYPI_PACKAGES_INSTALL - Supporting both
uv pipandpipfor package installation - Handling custom PyPI indices via
PIP_INDEX_URLandPIP_EXTRA_INDEX_URL
GPUStack Runtime Environment Variables
These environment variables are used by GPUStack runtime. Commonly used to adjust the behavior of inference backends running in Docker/Kubernetes.
They are only usable within workers. Please set the environment variables in the workers’ containers to ensure they take effect properly.
Global Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_LOG_LEVEL |
Log level. | INFO |
GPUSTACK_RUNTIME_LOG_WARNING |
Enable logging warnings. | 0 |
GPUSTACK_RUNTIME_LOG_EXCEPTION |
Enable logging exceptions. | 0 |
Detector Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_DETECT |
Detector to use. Options: Auto, AMD, ASCEND, CAMBRICON, HYGON, ILUVATAR, METAX, MTHREADS, NVIDIA. | Auto |
GPUSTACK_RUNTIME_DETECT_NO_PCI_CHECK |
Enable no PCI check during detection. Useful for WSL environments. | (empty) |
GPUSTACK_RUNTIME_DETECT_NO_TOOLKIT_CALL |
Enable only using management libraries calls during detection. Device detection typically involves calling platform-side management libraries and platform-side toolkit to retrieve extra information. For example, during NVIDIA detection, the NVML and CUDA are called, with CUDA used to retrieve GPU cores. However, if certain toolchains are not correctly installed in the environment, such as the Nvidia Fabric Manager being missing, calling the CUDA can cause blocking. Enabling this parameter can prevent blocking events. | 0 |
GPUSTACK_RUNTIME_DETECT_BACKEND_MAP_RESOURCE_KEY |
The detected backend mapping to resource keys, e.g {"cuda": "nvidia.com/devices", "rocm": "amd.com/devices"}. Used to map the gpustack-runner's backend name to the corresponding resource key. |
The default values named by each vendor |
GPUSTACK_RUNTIME_DETECT_PHYSICAL_INDEX_PRIORITY |
Use physical index priority at detecting devices. | 1 |
ROCm Detector Specific Variables
Note
Also applicable to ROCm-based backends, like Hygon.
| Variable | Description | Default |
|---|---|---|
ROCM_SMI_LIB_PATH |
ROCm SMI library path. | (empty) |
ROCM_HOME |
ROCm home directory. | (empty) |
ROCM_PATH |
ROCm path. | /opt/rocm |
ROCM_CORE_LIB_PATH |
ROCm core library path. | (empty) |
Deployer Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_DEPLOY |
Deployer to use. Options: Auto, Docker, Kubernetes, Podman(Experimental). | Auto |
GPUSTACK_RUNTIME_DEPLOY_API_CALL_ERROR_DETAIL |
Enable detailing the API call error during deployment. | 1 |
GPUSTACK_RUNTIME_DEPLOY_PRINT_CONVERSION |
Enable printing the conversion during deployment. GPUStack Runtime provides a unified Workload definition API, which will be converted to the specific Container Runtime API calls(e.g., Docker SDK, Kubernetes API, Podman SDK). Enabling this option will print the final converted API calls in INFO log for debugging purposes. | 0 |
GPUSTACK_RUNTIME_DEPLOY_ASYNC |
Enable asynchronous deployment. | 1 |
GPUSTACK_RUNTIME_DEPLOY_ASYNC_THREADS |
The number of threads in the threadpool. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_NAME |
The name of the deployer. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT_IGNORE_ENVIRONMENTS |
Environment variable names to ignore during mirrored deployment. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT_IGNORE_VOLUMES |
Volume mount destinations to ignore during mirrored deployment. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_DEFAULT_CONTAINER_REGISTRY_USERNAME |
Username for the default container registry. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_DEFAULT_CONTAINER_REGISTRY_PASSWORD |
Password for the default container registry. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_DEFAULT_CONTAINER_NAMESPACE |
Namespace for default runner container images. | gpustack |
GPUSTACK_RUNTIME_DEPLOY_IMAGE_PULL_POLICY |
Image pull policy for the deployer (e.g., Always, IfNotPresent, Never). | IfNotPresent |
GPUSTACK_RUNTIME_DEPLOY_CDI_SPECS_DIRECTORY |
During deployment, path of directory containing Container Device Interface (CDI) specifications, or the directory to generate CDI specifications into. | /var/run/cdi |
GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_CDI |
Manual mapping of container device interfaces, which is used to tell the Container Runtime which devices to inject into the container, e.g., nvidia.com/devices=nvidia.com/gpu;amd.com/devices=amd.com/gpu. The key is the resource key, and the value is the Container Device Interface(CDI) key. |
The default values named by each vendor |
GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_RUNTIME_VISIBLE_DEVICES |
Manual mapping of runtime visible devices environment variables, which is used to tell the Container Runtime which devices to inject into the container, e.g., nvidia.com/devices=NVIDIA_VISIBLE_DEVICES;amd.com/devices=AMD_VISIBLE_DEVICES. The key is the resource key, and the value is the environment variable name. |
The default values named by each vendor |
GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_BACKEND_VISIBLE_DEVICES |
Manual mapping of backend visible devices environment variables, which is used to tell the Device Runtime (e.g., ROCm, CUDA, OneAPI) which devices to use inside the container, e.g., nvidia.com/devices=CUDA_VISIBLE_DEVICES;amd.com/devices=ROCR_VISIBLE_DEVICES. The key is the resource key, and the value is a list of environment variable names. |
The default values named by each vendor |
GPUSTACK_RUNTIME_DEPLOY_RUNTIME_VISIBLE_DEVICES_VALUE_UUID |
Use UUIDs for the given runtime visible devices environment variables. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_BACKEND_VISIBLE_DEVICES_VALUE_ALIGNMENT |
Enable value alignment for the given backend visible devices environment variables. | ASCEND_RT_VISIBLE_DEVICES,NPU_VISIBLE_DEVICES |
GPUSTACK_RUNTIME_DEPLOY_CPU_AFFINITY |
Enable CPU affinity for deployed workloads. | 0 |
GPUSTACK_RUNTIME_DEPLOY_NUMA_AFFINITY |
Enable NUMA affinity for deployed workloads. When enabled, GPUSTACK_RUNTIME_DEPLOY_CPU_AFFINITY is also implied. |
0 |
Docker Deployer Specific Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_DOCKER_HOST |
Host for Docker connection. Used to override the default Docker host. | http+unix:///var/run/docker.sock |
GPUSTACK_RUNTIME_DOCKER_PAUSE_IMAGE |
Container image used for the pause container in Docker. | gpustack/runtime:pause |
GPUSTACK_RUNTIME_DOCKER_UNHEALTHY_RESTART_IMAGE |
Container image used for unhealthy restart container in Docker. | gpustack/runtime:health |
GPUSTACK_RUNTIME_DOCKER_RESOURCE_INJECTION_POLICY |
Resource injection policy for the Docker deployer (e.g., Env, CDI). Env: Injects resources using standard environment variable, based on GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_RUNTIME_VISIBLE_DEVICES. CDI: Injects resources using CDI, based on GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_CDI. |
Env |
GPUSTACK_RUNTIME_DOCKER_CDI_SPECS_GENERATE |
Generate CDI specifications during deployment when using CDI resource injection policy, requires GPUSTACK_RUNTIME_DEPLOY_CDI_SPECS_DIRECTORY to exist. Works only when GPUSTACK_RUNTIME_DOCKER_RESOURCE_INJECTION_POLICY is set to CDI. Using internal knowledge to generate the CDI specifications for deployer, if the output file conflicts with other tools generating CDI specifications(e.g., NVIDIA Container Toolkit), please disable this and remove the output file manually. |
1 |
Kubernetes Deployer Specific Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_KUBERNETES_NODE_NAME |
Name of the Kubernetes Node to deploy workloads to. | (empty) |
GPUSTACK_RUNTIME_KUBERNETES_NAMESPACE |
Namespace of the Kubernetes to deploy workloads to. | default |
GPUSTACK_RUNTIME_KUBERNETES_DOMAIN_SUFFIX |
Domain suffix for Kubernetes services. | cluster.local |
GPUSTACK_RUNTIME_KUBERNETES_SERVICE_TYPE |
Service type for Kubernetes services. Options: ClusterIP, NodePort, LoadBalancer. | ClusterIP |
GPUSTACK_RUNTIME_KUBERNETES_QUORUM_READ |
Whether to use quorum read for Kubernetes services. | 0 |
GPUSTACK_RUNTIME_KUBERNETES_DELETE_PROPAGATION_POLICY |
Deletion propagation policy for Kubernetes resources. Options: Foreground, Background, Orphan. | Foreground |
GPUSTACK_RUNTIME_KUBERNETES_RESOURCE_INJECTION_POLICY |
Resource injection policy for the Kubernetes deployer. Options: Auto, Env, KDP. Auto: Automatically choose the resource injection policy based on the environment. Env: Injects resources using standard environment variable, depends on underlying Container Toolkit, based on GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_RUNTIME_VISIBLE_DEVICES. KDP: Injects resources using Kubernetes Device Plugin. |
Auto |
GPUSTACK_RUNTIME_KUBERNETES_KDP_PER_DEVICE_MAX_ALLOCATIONS |
Maximum allocations for one device in Kubernetes Device Plugin. | 10 |
GPUSTACK_RUNTIME_KUBERNETES_KDP_DEVICE_ALLOCATION_POLICY |
Device allocation policy for the Kubernetes Device Plugin. Options: Auto, CDI, Env, Opaque. Auto: Automatically choose the device allocation policy based on the environment. Env: Allocates devices using runtime-visible environment variables, based on GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_RUNTIME_VISIBLE_DEVICES; requires Container Toolkit support. CDI: Allocates devices using generated CDI specifications, based on GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_CDI, making it easy to debug and troubleshoot; requires GPUSTACK_RUNTIME_DEPLOY_CDI_SPECS_DIRECTORY to exist. Opaque: Uses internal logic for allocation, which is convenient for deployment but difficult to troubleshoot. |
Auto |
GPUSTACK_RUNTIME_KUBERNETES_KDP_CDI_SPECS_GENERATE |
Generate CDI specifications during deployment, requires GPUSTACK_RUNTIME_DEPLOY_CDI_SPECS_DIRECTORY to exist. Works only when GPUSTACK_RUNTIME_KUBERNETES_KDP_DEVICE_ALLOCATION_POLICY is set to CDI. Using internal knowledge to generate the CDI specifications for deployer, if the output file conflicts with other tools generating CDI specifications(e.g., NVIDIA Container Toolkit), please disable this and remove the output file manually. |
1 |
Podman Deployer Specific Variables
Note
Podman deployer is experimental and needs version 4.9 or higher.
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_PODMAN_HOST |
Host for Podman connection. Used to override the default Podman host. | http+unix:///run/podman/podman.sock |
GPUSTACK_RUNTIME_PODMAN_PAUSE_IMAGE |
Container image used for the pause container in Podman. Default is same as GPUSTACK_RUNTIME_DOCKER_PAUSE_IMAGE. |
gpustack/runtime:pause |
GPUSTACK_RUNTIME_PODMAN_UNHEALTHY_RESTART_IMAGE |
Container image used for unhealthy restart container in Podman. Default is same as GPUSTACK_RUNTIME_DOCKER_UNHEALTHY_RESTART_IMAGE. |
gpustack/runtime:health |
GPUSTACK_RUNTIME_PODMAN_CDI_SPECS_GENERATE |
Generate CDI specifications during deployment, requires GPUSTACK_RUNTIME_DEPLOY_CDI_SPECS_DIRECTORY to exist. Using internal knowledge to generate the CDI specifications for deployer, if the output file conflicts with other tools generating CDI specifications(e.g., NVIDIA Container Toolkit), please disable this and remove the output file manually. Default is same as GPUSTACK_RUNTIME_DOCKER_CDI_SPECS_GENERATE. |
1 |