Managing Model Routes
This guide introduces how to use model routes, covering several common use cases and their configuration methods.
Create route when deploying model
When deploying models, the Enable Model Route checkbox is automatically enabled. It will create a model route for this deployment with the same name. This allows users to access the model directly using the same name after deployment.
Model Upgrade with Model Alias
When a new version of a model is released, the administrator may want to upgrade the model while keeping the same model name. In this case, the administrator can deploy a new version of the model with the model route disabled, and switch traffic by editing the existing model route.
- Add enough GPU resources for the new version of the model into GPUStack.
- Deploy the model with the new version, uncheck the
Enable Model Routeoption in the model deploy drawer. - Locate the model route that targets the old version of the model on the
Routespage. - Edit this route and replace the target with the new version model deployment.
- New requests to this model route will be routed to the new version model.
Serve Model for Self-host Models and Public MaaS Models
When the request volume for a self-hosted model increases, latency may occur. If there are no resources available for scaling up, introducing Public MaaS is an effective solution. By configuring both the deployment model target and the provider’s model target in the model route and assigning weights, you can use Public MaaS services to help handle the current model’s access load.
- Go to the
Providerspage. - Add a provider as needed and select the models to use in the provider.
- Go to the
Routespage and locate the model to edit. - Edit the model and add route targets for this model route.
- Both models from GPUStack
Deploymentsand models fromProviderscan be selected as targets in the same route. - The weight (default is 100) for each target determines the traffic percentage for this route. For example, if target A has a weight of 100 and target B has a weight of 200, 33% of requests will be routed to A and 67% to B.
Model Route Fallback
Although assigning a Public MaaS model target to a model route is a convenient approach, it can also incur significant costs. The traffic distribution rules are always in effect, so even when the self-hosted model is not under heavy load, traffic will still be forwarded to Public MaaS according to the configuration. In such cases, using the Model Route Fallback feature can be very effective.
- Go to the
Routespage and locate the model route you want to set a fallback for. - Edit this route and configure the
Fallback Route Target. Like other route targets, it can be a model from GPUStackDeploymentsor fromProviders. - For the fallback target, it is mutually exclusive with the traffic distribution strategy, so you cannot configure weights for this target by design.
Proxy OpenAI Compatible Inference Service via Model Route
If a running inference service (such as ollama or lm-studio) wants to use GPUStack for proxying, access control, and token usage statistics, you can create a custom-path OpenAI Model Provider for hosting.
- Go to the
Providerspage and click theAdd Providerbutton. - Select
OpenAIas the type and set theCustom Base URLin the form ofhttp://<ip>:<port>/v1for your OpenAI-compatible inference server. Set the name, API key, and description as needed. - Add models for this provider. The available models will be listed for selection if the inference server supports the
/v1/modelsAPI. - Click the
Savebutton. - Click
Add Routein theOperationscolumn for this provider. - The first model of the provider will be used to pre-configure the route. Adjust the route configuration as needed.
- Click the
Savebutton to apply the model route. Your model is now proxied by GPUStack. - Authorize access to this route using
Access Settingin theOperationscolumn.