Using Audio Models
GPUStack supports running both speech-to-text and text-to-speech models. Speech-to-text models convert audio inputs in various languages into written text, while text-to-speech models transform written text into natural and expressive speech.
In this tutorial, we will walk you through deploying and using speech-to-text and text-to-speech models in GPUStack.
Prerequisites
Before you begin, ensure that you have the following:
- A Linux system with AMD architecture or macOS.
- Access to Hugging Face for downloading the model files.
- GPUStack is installed and running. If not, refer to the Quickstart Guide.
Running Speech-to-Text Model
Step 1: Deploy Speech-to-Text Model
Follow these steps to deploy the model from Hugging Face:
- Navigate to the
Modelspage in the GPUStack UI. - Click the
Deploy Modelbutton. - In the dropdown, select
Hugging Faceas the source for your model. - Use the search bar in the top left to search for the model name
Systran/faster-whisper-medium. - Leave everything as default and click the
Savebutton to deploy the model.
After deployment, you can monitor the model's status on the Models page.
Step 2: Interact with Speech-to-Text Model Models
- Navigate to the
Playground>Audiopage in the GPUStack UI. - Select the
Speech to TextTab. - Select the deployed model from the top-right dropdown.
- Click the
Uploadbutton to upload audio file or click theMicrophonebutton to record audio. - Click the
Generate Text Contentbutton to generate the text.
Running Text-to-Speech Model
Step 1: Deploy Text-to-Speech Model
Follow these steps to deploy the model from Hugging Face:
- Navigate to the
Modelspage in the GPUStack UI. - Click the
Deploy Modelbutton. - In the dropdown, select
Hugging Faceas the source for your model. - Use the search bar in the top left to search for the model name
FunAudioLLM/CosyVoice-300M. - Leave everything as default and click the
Savebutton to deploy the model.
After deployment, you can monitor the model's status on the Models page.
Step 2: Interact with Text to Speech Model Models
- Navigate to the
Playground>Audiopage in the GPUStack UI. - Select the
Text to SpeechTab. - Choose the deployed model from the dropdown menu in the top-right corner. Then, configure the voice and output audio format.
- Input the text to generate.
- Click the
Submitbutton to generate the text.





