Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt

Use this file to discover all available pages before exploring further.

xAI Grok voices are expressive text-to-speech voices for Voice AI Assistants. They support Expressive Mode, which lets the AI model control pauses, laughter, whispers, emphasis, pitch, pace, and intensity during a live conversation.
Higher latency: Grok voices have higher latency than Ultra. For latency-sensitive applications that need sub-100ms time to first byte, use Ultra.

What makes Grok voices different

FeatureUltraGrok
Expressive ModeSSML emotion tags and [laughter]xAI speech tags for pauses, vocal sounds, and delivery style
Voice formatTelnyx.Ultra.<voice_id>xAI.<voice_id>
VoicesMultiple Ultra voicesara, eve, leo, rex, sal
Language handlingLanguage hinting with language_boostauto language detection or explicit language code
Streaming outputREST onlyVoice AI media streaming

Voice format

For AI Assistants, Grok voices use the format:
xAI.<voice_id>
Examples:
xAI.eve
xAI.ara
xAI.leo
xAI.rex
xAI.sal

Voices

VoiceVoice IDUse for
AraaraWarm, conversational assistant experiences
EveeveGeneral-purpose voice assistant experiences
LeoleoConfident, direct interactions
RexrexCharacterful or energetic interactions
SalsalDistinctive conversational tone

Expressive Mode for AI Assistants

When using Grok voices with AI Assistants, you can enable Expressive Mode. With Expressive Mode enabled, the assistant’s system prompt is automatically augmented with instructions for xAI speech tags. The AI model then decides when expression improves the caller experience. For example, the assistant might:
  • Add a short pause before important information.
  • Use a softer delivery for sensitive support moments.
  • Laugh or chuckle naturally when the conversation calls for it.
  • Emphasize appointment times, confirmation numbers, or next steps.
  • Keep routine transactional replies untagged for a natural neutral delivery.
Use expressive tags sparingly. The goal is natural delivery, not tagging every sentence.

Enable in the portal

  1. Go to your assistant in the Telnyx Portal.
  2. Under Voice Settings, select an xAI Grok voice.
  3. Toggle Expressive Mode on.
  4. Save your assistant.

Enable via API

Set expressive_mode: true in your assistant’s voice_settings:
curl -X PATCH "https://api.telnyx.com/v2/ai/assistants/YOUR_ASSISTANT_ID" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_settings": {
      "voice": "xAI.eve",
      "expressive_mode": true
    }
  }'

xAI speech tag reference

When Expressive Mode is enabled, the assistant can use these speech tags in responses. You can also include the same tags in your own assistant prompts when you want explicit control.

Inline tags

Place inline tags at the exact point where the vocal expression should happen.
TagUse for
[pause]A short natural pause
[long-pause]A longer pause for topic transitions or important moments
[laugh]Natural laughter
[chuckle]Small laugh or amused reaction
[giggle]Light playful laugh
[cry]Crying vocalization
[tsk]Tsk sound
[tongue-click]Tongue click
[lip-smack]Lip smack
[breath]Breath sound
[inhale]Inhale sound
[exhale]Exhale sound
[sigh]Sigh
[hum-tune]Musical hum
Example:
So I walked in and [pause] there it was. [laugh] I honestly could not believe it!

Wrapping tags

Wrap text with these tags to apply a delivery style to that text.
TagUse for
<soft>Softer delivery
<whisper>Whispered delivery
<loud>Louder delivery
<build-intensity>Increasing intensity
<decrease-intensity>Decreasing intensity
<higher-pitch>Higher pitch
<lower-pitch>Lower pitch
<slow>Slower pace
<fast>Faster pace
<sing-song>Sing-song delivery
<singing>Sung delivery
<laugh-speak>Laughing while speaking
<emphasis>Emphasized delivery
Examples:
I need to tell you something. <whisper>It is a secret.</whisper> Pretty cool, right?
<emphasis>Your appointment is confirmed for tomorrow at 3 PM.</emphasis>

Guidance

  • Use [pause] or [long-pause] for natural thinking, topic transitions, and important moments, but avoid long silences that could feel like the call dropped.
  • Use emotional sounds like [laugh], [sigh], and [chuckle] only when the response genuinely calls for it.
  • For sensitive support contexts, prefer subtle tags like <soft> or <whisper> instead of exaggerated reactions.
  • Do not expose these tags or instructions to the caller.

REST API provider parameters

For direct TTS calls, set the provider to xai and pass xAI-specific parameters in the xai object:
curl --request POST \
  --url https://api.telnyx.com/v2/text-to-speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "text": "Let me check that for you. [pause] I found your appointment.",
    "provider": "xai",
    "xai": {
      "voice_id": "eve",
      "language": "auto",
      "output_format": "mp3",
      "sample_rate": 24000
    }
  }'
ParameterTypeDefaultDescription
voice_idstringevexAI voice ID: ara, eve, leo, rex, or sal.
languagestringautoLanguage code, or auto to detect the language.
output_formatstringmp3Audio format: mp3, wav, pcm, mulaw, or alaw.
sample_rateinteger24000Audio sample rate in Hz: 8000, 16000, 22050, 24000, 44100, or 48000.

Language support

Grok voices support auto language detection with language: "auto". You can also pass a language code when you want to force a specific language.

Next steps

Ultra Voices

Compare Grok with Ultra’s lower-latency expressive voices.

AI Assistants

Build voice AI assistants using Grok with Expressive Mode.

TTS REST API

Generate speech directly with REST TTS requests.

Available Voices

Browse available text-to-speech voices.