Pinned Backend Versions
Inference engines in the generative AI domain are evolving rapidly to enhance performance and unlock new capabilities. This constant evolution provides exciting opportunities but also presents challenges for maintaining model compatibility and deployment stability.
GPUStack allows you to pin inference backend versions to specific releases, offering a balance between staying up-to-date with the latest advancements and ensuring a reliable runtime environment. This feature is particularly beneficial in the following scenarios:
- Leveraging the newest backend features without waiting for a GPUStack update.
- Locking in a specific backend version to maintain compatibility with existing models.
- Assigning different backend versions to models with varying requirements.
By pinning backend versions, you gain full control over your inference environment, enabling both flexibility and predictability in deployment.
Automatic Installation of Pinned Backend Versions
To simplify deployment, GPUStack supports the automatic installation of pinned backend versions when feasible. The process depends on the type of backend:
- Prebuilt Binaries
For backends likellama-box
, GPUStack downloads the specified version using the same mechanism as in GPUStack bootstrapping.
Tip
You can customize the download source using the --tools-download-base-url
configuration option.
- Python-based Backends
For backends likevLLM
andvox-box
, GPUStack usespipx
to install the specified version in an isolated Python environment.
Tip
- Ensure that
pipx
is installed on the worker nodes. - If
pipx
is not in the system PATH, specify its location with the--pipx-path
configuration option.
This automation reduces manual intervention, allowing you to focus on deploying and using your models.
Manual Installation of Pinned Backend Versions
When automatic installation is not feasible or preferred, GPUStack provides a straightforward way to manually install specific versions of inference backends. Follow these steps:
- Prepare the Executable
Install the backend executable or link it under the GPUStack bin directory. The default locations are:
- Linux/macOS:
/var/lib/gpustack/bin
- Windows:
$env:AppData\gpustack\bin
Tip
You can customize the bin directory using the --bin-dir
configuration option.
- Name the Executable
Ensure the executable is named in the following format:
- Linux/macOS:
<backend>_<version>
- Windows:
<backend>_<version>.exe
For example, the vLLM executable for version v0.6.4 should be named vllm_v0.6.4
on Linux.
By following these steps, you can maintain full control over the backend installation process, ensuring that the correct version is used for your deployment.