Environment Variables
GPUStack supports various environment variables for configuration.
Most command line parameters can also be set via environment variables with the GPUSTACK_ prefix and in uppercase format (e.g., --data-dir can be set via GPUSTACK_DATA_DIR).
For a complete list of command line parameters that can be set as environment variables, see CLI Reference. This will not be discussed here.
Priority Order
Configuration values are applied in the following priority order (highest to lowest):
- Command line arguments
- Environment variables
- Configuration file
- Default values
This means that command line arguments will always override environment variables, and environment variables will override values in the configuration file.
GPUStack Core Environment Variables
These environment variables are typically used for third-party service integrations.
Hugging Face Hub
| Variable | Description | Default |
|---|---|---|
HF_ENDPOINT |
Hugging Face Hub endpoint. e.g., https://hf-mirror.com |
(empty) |
Database Configuration
| Variable | Description | Default |
|---|---|---|
GPUSTACK_DB_ECHO |
Enable database query logging. | false |
GPUSTACK_DB_POOL_SIZE |
Database connection pool size. | 5 |
GPUSTACK_DB_MAX_OVERFLOW |
Database connection pool max overflow. | 10 |
GPUSTACK_DB_POOL_TIMEOUT |
Database connection pool timeout in seconds. | 30 |
Network Configuration
| Variable | Description | Default |
|---|---|---|
GPUSTACK_PROXY_TIMEOUT_SECONDS |
Proxy timeout in seconds. | 1800 |
GPUSTACK_TCP_CONNECTOR_LIMIT |
HTTP client TCP connector limit. | 1000 |
Authentication & Security
| Variable | Description | Default |
|---|---|---|
GPUSTACK_JWT_TOKEN_EXPIRE_MINUTES |
JWT token expiration time in minutes. | 120 |
Gateway Configuration
| Variable | Description | Default |
|---|---|---|
GPUSTACK_HIGRESS_EXT_AUTH_TIMEOUT_MS |
Higress external authentication timeout in milliseconds. | 3000 |
Worker and Model Configuration
| Variable | Description | Default |
|---|---|---|
GPUSTACK_WORKER_HEARTBEAT_GRACE_PERIOD |
Worker heartbeat grace period in seconds. | 150 |
GPUSTACK_WORKER_ORPHAN_WORKLOAD_CLEANUP_GRACE_PERIOD |
Worker orphan workload cleanup grace period in seconds. | 300 |
GPUSTACK_MODEL_INSTANCE_RESCHEDULE_GRACE_PERIOD |
Model instance reschedule grace period in seconds. | 300 |
GPUSTACK_MODEL_INSTANCE_HEALTH_CHECK_INTERVAL |
Model instance health check interval in seconds. | 3 |
GPUSTACK_MODEL_EVALUATION_CACHE_MAX_SIZE |
Maximum size of model evaluation cache. | 1000 |
GPUSTACK_MODEL_EVALUATION_CACHE_TTL |
TTL of model evaluation cache in seconds. | 3600 |
GPUSTACK_DISABLE_OS_FILELOCK |
Disable OS file lock. | false |
Model Deployment Configuration
Note
These environment variables are not set when starting GPUStack. Instead, they should be configured in the Advanced Options > Environment Variables section when deploying a model. They are used to customize the model serving behavior.
Variable |
Description | Default |
|---|---|---|
GPUSTACK_MODEL_SERVING_COMMAND_SCRIPT_DISABLED |
Disable the automatic serving command script execution. When set to 1 or true, the script that handles package installation and other setup tasks will not run. |
0 |
PYPI_PACKAGES_INSTALL |
Additional PyPI packages to install in the model serving environment. Multiple packages should be space-separated. The script will use uv pip install if available, otherwise pip install. |
(empty) |
Usage Example
When deploying a model, navigate to Advanced Options > Environment Variables and add:
# Install additional packages before model serving starts
PYPI_PACKAGES_INSTALL=torch-audio==2.0.0 transformers==4.30.0
# Disable the serving command script entirely
GPUSTACK_MODEL_SERVING_COMMAND_SCRIPT_DISABLED=1
The serving command script automatically handles:
- Installing additional PyPI packages specified in
PYPI_PACKAGES_INSTALL - Supporting both
uv pipandpipfor package installation - Handling custom PyPI indices via
PIP_INDEX_URLandPIP_EXTRA_INDEX_URL
GPUStack Runtime Environment Variables
These environment variables are used by GPUStack runtime. Commonly used to adjust the behavior of inference backends running in Docker/Kubernetes.
They are only usable within workers. Please set the environment variables in the workers’ containers to ensure they take effect properly.
Global Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_LOG_LEVEL |
Log level. | INFO |
GPUSTACK_RUNTIME_LOG_TO_FILE |
Log to file path instead of stdout. | (empty) |
GPUSTACK_RUNTIME_LOG_WARNING |
Enable logging warnings. | 0 |
GPUSTACK_RUNTIME_LOG_EXCEPTION |
Enable logging exceptions. | 1 |
Detector Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_DETECT |
Detector to use. Options: Auto, AMD, ASCEND, CAMBRICON, HYGON, ILUVATAR, METAX, MTHREADS, NVIDIA. | Auto |
GPUSTACK_RUNTIME_DETECT_NO_PCI_CHECK |
Enable no PCI check during detection. Useful for WSL environments. | (empty) |
GPUSTACK_RUNTIME_DETECT_BACKEND_MAP_RESOURCE_KEY |
Backend mapping to resource keys. | The default values named by each vendor |
GPUSTACK_RUNTIME_DETECT_PHYSICAL_INDEX_PRIORITY |
Use physical index priority at detecting devices. | 1 |
Deployer Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_DEPLOY |
Deployer to use. Options: Auto, Docker, Kubernetes. | Auto |
GPUSTACK_RUNTIME_DEPLOY_DEFAULT_REGISTRY |
Default container registry for deployer to pull images from. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_API_CALL_ERROR_DETAIL |
Enable detailing the API call error during deployment. | 1 |
GPUSTACK_RUNTIME_DEPLOY_ASYNC |
Enable asynchronous deployment. | 1 |
GPUSTACK_RUNTIME_DEPLOY_ASYNC_THREADS |
The number of threads in the threadpool. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT |
Enable mirrored deployment mode. | 0 |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_NAME |
The name of the deployer. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT_IGNORE_ENVIRONMENTS |
Environment variable names to ignore during mirrored deployment. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT_IGNORE_VOLUMES |
Volume mount destinations to ignore during mirrored deployment. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_CORRECT_RUNNER_IMAGE |
Correct the gpustack-runner image by rendering it with the host's detection. | 1 |
GPUSTACK_RUNTIME_DEPLOY_LABEL_PREFIX |
Label prefix for the deployer. | runtime.gpustack.ai |
GPUSTACK_RUNTIME_DEPLOY_AUTOMAP_RESOURCE_KEY |
The resource key to use for automatic mapping of container backend visible devices environment variables. | gpustack.ai/devices |
GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_RUNTIME_VISIBLE_DEVICES |
Manual mapping of runtime visible devices environment variables. | The default values named by each vendor |
GPUSTACK_RUNTIME_DEPLOY_RESOURCE_KEY_MAP_BACKEND_VISIBLE_DEVICES |
Manual mapping of backend visible devices environment variables. | The default values named by each vendor |
GPUSTACK_RUNTIME_DEPLOY_RUNTIME_VISIBLE_DEVICES_VALUE_UUID |
Use UUIDs for the given runtime visible devices environment variables. | (empty) |
GPUSTACK_RUNTIME_DEPLOY_BACKEND_VISIBLE_DEVICES_VALUE_ALIGNMENT |
Enable value alignment for the given backend visible devices environment variables. | ASCEND_RT_VISIBLE_DEVICES,NPU_VISIBLE_DEVICES |
Docker Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_DOCKER_MIRRORED_NAME_FILTER_LABELS |
Filter labels for selecting the mirrored deployer container in Docker. | (empty) |
GPUSTACK_RUNTIME_DOCKER_PAUSE_IMAGE |
Docker image used for the pause container. | gpustack/runtime:pause |
GPUSTACK_RUNTIME_DOCKER_UNHEALTHY_RESTART_IMAGE |
Docker image used for unhealthy restart container. | gpustack/runtime:health |
GPUSTACK_RUNTIME_DOCKER_EPHEMERAL_FILES_DIR |
Directory for storing ephemeral files for Docker. | ~/.cache/gpustack-runtime |
GPUSTACK_RUNTIME_DOCKER_MUTE_ORIGINAL_HEALTHCHECK |
Mute the original healthcheck of the container. | 1 |
Kubernetes Variables
| Variable | Description | Default |
|---|---|---|
GPUSTACK_RUNTIME_KUBERNETES_NODE_NAME |
Name of the Kubernetes Node to deploy workloads to. | (empty) |
GPUSTACK_RUNTIME_KUBERNETES_NAMESPACE |
Namespace of the Kubernetes to deploy workloads to. | default |
GPUSTACK_RUNTIME_KUBERNETES_DOMAIN_SUFFIX |
Domain suffix for Kubernetes services. | cluster.local |
GPUSTACK_RUNTIME_KUBERNETES_SERVICE_TYPE |
Service type for Kubernetes services. Options: ClusterIP, NodePort, LoadBalancer. | ClusterIP |
GPUSTACK_RUNTIME_KUBERNETES_QUORUM_READ |
Whether to use quorum read for Kubernetes services. | 0 |
ROCm Detector Variables
| Variable | Description | Default |
|---|---|---|
ROCM_SMI_LIB_PATH |
ROCm SMI library path. | (empty) |
ROCM_HOME |
ROCm home directory. | (empty) |
ROCM_PATH |
ROCm path. | /opt/rocm |
ROCM_CORE_LIB_PATH |
ROCm core library path. | (empty) |