Using Audio Models
GPUStack supports running both speech-to-text and text-to-speech models. Speech-to-text models convert audio inputs in various languages into written text, while text-to-speech models transform written text into natural and expressive speech.
In this tutorial, we will walk you through deploying and using speech-to-text and text-to-speech models in GPUStack.
Prerequisites
Before you begin, ensure that you have the following:
- A Linux system with AMD architecture or macOS.
- Access to Hugging Face for downloading the model files.
- GPUStack is installed and running. If not, refer to the Quickstart Guide.
Running Speech-to-Text Model
Step 1: Deploy Speech-to-Text Model
Follow these steps to deploy the model from Hugging Face:
- Navigate to the
Models
page in the GPUStack UI. - Click the
Deploy Model
button. - In the dropdown, select
Hugging Face
as the source for your model. - Use the search bar in the top left to search for the model name
Systran/faster-whisper-medium
. - Leave everything as default and click the
Save
button to deploy the model.
After deployment, you can monitor the model's status on the Models
page.
Step 2: Interact with Speech-to-Text Model Models
- Navigate to the
Playground
>Audio
page in the GPUStack UI. - Select the
Speech to Text
Tab. - Select the deployed model from the top-right dropdown.
- Click the
Upload
button to upload audio file or click theMicrophone
button to record audio. - Click the
Generate Text Content
button to generate the text.
Running Text-to-Speech Model
Step 1: Deploy Text-to-Speech Model
Follow these steps to deploy the model from Hugging Face:
- Navigate to the
Models
page in the GPUStack UI. - Click the
Deploy Model
button. - In the dropdown, select
Hugging Face
as the source for your model. - Use the search bar in the top left to search for the model name
FunAudioLLM/CosyVoice-300M
. - Leave everything as default and click the
Save
button to deploy the model.
After deployment, you can monitor the model's status on the Models
page.
Step 2: Interact with Text to Speech Model Models
- Navigate to the
Playground
>Audio
page in the GPUStack UI. - Select the
Text to Speech
Tab. - Choose the deployed model from the dropdown menu in the top-right corner. Then, configure the voice and output audio format.
- Input the text to generate.
- Click the
Submit
button to generate the text.