gpustack chat
Chat with a large language model.
gpustack chat model [prompt]
Positional Arguments
Name | Description |
---|---|
model | The model to use for chat. |
prompt | The prompt to send to the model. [Optional] |
One-time Chat with a Prompt
If a prompt is provided, it performs a one-time inference. For example:
gpustack chat llama3 "tell me a joke."
Example output:
Why couldn't the bicycle stand up by itself?
Because it was two-tired!
Interactive Chat
If the prompt
argument is not provided, you can chat with the large language model interactively. For example:
gpustack chat llama3
Example output:
>tell me a joke.
Here's one:
Why couldn't the bicycle stand up by itself?
(wait for it...)
Because it was two-tired!
Hope that made you smile!
>Do you have a better one?
Here's another one:
Why did the scarecrow win an award?
(think about it for a sec...)
Because he was outstanding in his field!
Hope that one stuck with you!
Do you want to hear another one?
>\quit
Interactive Commands
Followings are available commands in interactive chat:
Commands:
\q or \quit - Quit the chat
\c or \clear - Clear chat context in prompt
\? or \h or \help - Print this help message
Connect to External GPUStack Server
If you are not running gpustack chat
on the server node, or if you are serving on a custom host or port, you should provide the following environment variables:
Name | Description |
---|---|
GPUSTACK_SERVER_URL | URL of the GPUStack server, e.g., http://myserver . |
GPUSTACK_API_KEY | GPUStack API key. |