Using Image Generation Models
GPUStack supports deploying and running Image Generation Models. In this guide, you deploy and use the Qwen-Image text-to-image model via the SGLang backend, then generate images from textual prompts in the GPUStack UI.
Prerequisites
Before you begin, ensure that you have the following:
- A GPU with at least 55 GB of VRAM.
- Access to Hugging Face or ModelScope to download the
Qwen/Qwen-Imagerepository. - GPUStack is installed and running. If not, refer to the Quickstart Guide.
Step 1: Deploy the Qwen-Image Model
Follow these steps to deploy the model from Hugging Face:
- Navigate to the
Deploymentspage in the GPUStack UI. - Click the
Deploy Modelbutton. - In the dropdown, select
Hugging Faceas the source for your model. - Use the search bar in the top-left to search for the repository
Qwen/Qwen-Image. - Keep the default settings and click the
Savebutton to deploy. GPUStack will start the SGLang backend for Qwen-Image and download the required files.
After deployment, you can monitor the model deployment's status on the Deployments page.
Step 2: Use the Model for Image Generation
- Navigate to the
Playground>Imagepage in the GPUStack UI. - Verify that the deployed
qwen-imagemodel is selected from the top-rightModeldropdown. - Enter a prompt describing the image you want to generate. For example:
a female character with long, flowing hair that appears to be made of ethereal, swirling patterns resembling the Northern Lights or Aurora Borealis. The background is dominated by deep blues and purples, creating a mysterious and dramatic atmosphere. The character's face is serene, with pale skin and striking features. She wears a dark-colored outfit with subtle patterns. The overall style of the artwork is reminiscent of fantasy or supernatural genres.
- If your UI shows a
Samplerdropdown andSample Steps, keep the defaults. Qwen-Image uses a Diffusers-based flow matching scheduler under SGLang; GPUStack maps UI settings to appropriate parameters. - Click the
Submitbutton to create the image.
The generated image will be displayed in the UI. Results vary with randomness, seeds, and prompt details.

After completing an image generation task, memory usage typically increases from ~45 GB (weights only) to ~54 GB due to additional VRAM consumption by diffusion models. If additional models like LoRA are used, VRAM consumption will further increase.
Conclusion
With this setup, you can generate unique and visually compelling images from textual prompts using Qwen-Image served by SGLang. Experiment with different prompts and settings to push the boundaries of what’s possible.


