xAI Grok voices are expressive text-to-speech voices for Voice AI Assistants. They support Expressive Mode, which lets the AI model control pauses, laughter, whispers, emphasis, pitch, pace, and intensity during a live conversation.Documentation Index
Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
Use this file to discover all available pages before exploring further.
What makes Grok voices different
| Feature | Ultra | Grok |
|---|---|---|
| Expressive Mode | SSML emotion tags and [laughter] | xAI speech tags for pauses, vocal sounds, and delivery style |
| Voice format | Telnyx.Ultra.<voice_id> | xAI.<voice_id> |
| Voices | Multiple Ultra voices | ara, eve, leo, rex, sal |
| Language handling | Language hinting with language_boost | auto language detection or explicit language code |
| Streaming output | REST only | Voice AI media streaming |
Voice format
For AI Assistants, Grok voices use the format:Voices
| Voice | Voice ID | Use for |
|---|---|---|
| Ara | ara | Warm, conversational assistant experiences |
| Eve | eve | General-purpose voice assistant experiences |
| Leo | leo | Confident, direct interactions |
| Rex | rex | Characterful or energetic interactions |
| Sal | sal | Distinctive conversational tone |
Expressive Mode for AI Assistants
When using Grok voices with AI Assistants, you can enable Expressive Mode. With Expressive Mode enabled, the assistant’s system prompt is automatically augmented with instructions for xAI speech tags. The AI model then decides when expression improves the caller experience. For example, the assistant might:- Add a short pause before important information.
- Use a softer delivery for sensitive support moments.
- Laugh or chuckle naturally when the conversation calls for it.
- Emphasize appointment times, confirmation numbers, or next steps.
- Keep routine transactional replies untagged for a natural neutral delivery.
Enable in the portal
- Go to your assistant in the Telnyx Portal.
- Under Voice Settings, select an xAI Grok voice.
- Toggle Expressive Mode on.
- Save your assistant.
Enable via API
Setexpressive_mode: true in your assistant’s voice_settings:
xAI speech tag reference
When Expressive Mode is enabled, the assistant can use these speech tags in responses. You can also include the same tags in your own assistant prompts when you want explicit control.Inline tags
Place inline tags at the exact point where the vocal expression should happen.| Tag | Use for |
|---|---|
[pause] | A short natural pause |
[long-pause] | A longer pause for topic transitions or important moments |
[laugh] | Natural laughter |
[chuckle] | Small laugh or amused reaction |
[giggle] | Light playful laugh |
[cry] | Crying vocalization |
[tsk] | Tsk sound |
[tongue-click] | Tongue click |
[lip-smack] | Lip smack |
[breath] | Breath sound |
[inhale] | Inhale sound |
[exhale] | Exhale sound |
[sigh] | Sigh |
[hum-tune] | Musical hum |
Wrapping tags
Wrap text with these tags to apply a delivery style to that text.| Tag | Use for |
|---|---|
| <soft> | Softer delivery |
| <whisper> | Whispered delivery |
| <loud> | Louder delivery |
| <build-intensity> | Increasing intensity |
| <decrease-intensity> | Decreasing intensity |
| <higher-pitch> | Higher pitch |
| <lower-pitch> | Lower pitch |
| <slow> | Slower pace |
| <fast> | Faster pace |
| <sing-song> | Sing-song delivery |
| <singing> | Sung delivery |
| <laugh-speak> | Laughing while speaking |
| <emphasis> | Emphasized delivery |
Guidance
- Use
[pause]or[long-pause]for natural thinking, topic transitions, and important moments, but avoid long silences that could feel like the call dropped. - Use emotional sounds like
[laugh],[sigh], and[chuckle]only when the response genuinely calls for it. - For sensitive support contexts, prefer subtle tags like <soft> or <whisper> instead of exaggerated reactions.
- Do not expose these tags or instructions to the caller.
REST API provider parameters
For direct TTS calls, set the provider toxai and pass xAI-specific parameters in the xai object:
| Parameter | Type | Default | Description |
|---|---|---|---|
voice_id | string | eve | xAI voice ID: ara, eve, leo, rex, or sal. |
language | string | auto | Language code, or auto to detect the language. |
output_format | string | mp3 | Audio format: mp3, wav, pcm, mulaw, or alaw. |
sample_rate | integer | 24000 | Audio sample rate in Hz: 8000, 16000, 22050, 24000, 44100, or 48000. |
Language support
Grok voices support auto language detection withlanguage: "auto". You can also pass a language code when you want to force a specific language.
Next steps
Ultra Voices
Compare Grok with Ultra’s lower-latency expressive voices.
AI Assistants
Build voice AI assistants using Grok with Expressive Mode.
TTS REST API
Generate speech directly with REST TTS requests.
Available Voices
Browse available text-to-speech voices.