Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt

Use this file to discover all available pages before exploring further.

Telnyx AI Assistants support multiple speech-to-text (STT) models for transcribing caller audio. The model you choose affects transcription accuracy, supported languages, and response latency. You can also tune provider-specific transcription behavior, such as end-of-turn detection, formatting, keyterm boosting, and Azure region selection.

Available models

ModelEngineBest for
deepgram/fluxDeepgramConversational AI, optimized for turn-taking, English only
deepgram/nova-3DeepgramFast multilingual transcription, recommended for multilingual assistants
deepgram/nova-2DeepgramFast multilingual transcription on Deepgram’s previous-generation model
azure/fastAzureFast multilingual transcription with optional Azure region and API key configuration
assemblyai/universal-streamingAssemblyAIConversational, multilingual streaming transcription with configurable turn detection
xai/grok-sttxAIMultilingual transcription using Grok STT
deepgram/flux is English-only. For multilingual assistants, use deepgram/nova-3, deepgram/nova-2, azure/fast, assemblyai/universal-streaming, or xai/grok-stt.

Selecting a model

Portal

In the AI Assistants tab, edit your assistant and navigate to the Voice tab. Select your preferred STT model from the Transcription Model dropdown. AI Assistant Transcription Model Selection When you change models in the Portal, related settings are reset to the defaults for that provider. For example:
  • deepgram/flux sets language to English and applies Flux end-of-turn defaults.
  • Other Deepgram models enable smart_format and numerals by default.
  • assemblyai/universal-streaming applies AssemblyAI turn detection defaults.
  • azure/fast defaults the Azure region to latency, which auto-selects the closest supported Telnyx-managed region.

API

Set the transcription.model field when creating or updating an assistant:
curl -X POST https://api.telnyx.com/v2/ai/assistants \
  -H "Authorization: Bearer $TELNYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Assistant",
    "model": "anthropic/claude-haiku-4-5",
    "instructions": "You are a helpful voice assistant.",
    "transcription": {
      "model": "deepgram/flux"
    }
  }'
You can also set the transcription language explicitly. If omitted or set to auto, supported models auto-detect the language:
"transcription": {
  "model": "deepgram/nova-3",
  "language": "es"
}

Languages

Supported language options depend on the selected model.
ModelLanguage behavior
deepgram/fluxEnglish only
deepgram/nova-3Auto-detect plus supported Deepgram Nova 3 language codes
deepgram/nova-2Auto-detect plus supported Deepgram Nova 2 language codes
azure/fastExplicit Azure locale codes, such as en-US, es-MX, or fr-FR
assemblyai/universal-streamingauto for multilingual detection or en for English
xai/grok-sttauto plus supported Grok STT language codes
If your assistant has a language filter set elsewhere in the Voice tab, the Portal only shows transcription models and language choices that are compatible with that language.

Deepgram settings

Deepgram Flux end-of-turn detection

deepgram/flux is optimized for live voice agents. It provides end-of-turn detection so the assistant can start responding as soon as the caller finishes speaking. It also supports eager end-of-turn, which starts large language model (LLM) processing before the caller fully stops speaking to reduce perceived response latency. When you select deepgram/flux in the Portal, these settings are applied by default:
FieldTypeRangePortal defaultDescription
eot_thresholdnumber0.5-0.90.8Confidence required to trigger a final end of turn. Higher values require more confidence and may add latency.
eot_timeout_msinteger500-100005000Maximum silence duration, in milliseconds, before forcing an end of turn.
eager_eot_thresholdnumber0.3-0.90.4Confidence required to start speculative LLM processing before final end-of-turn confirmation. Lower values trigger earlier.
eager_eot_threshold must be less than or equal to eot_threshold. Setting both thresholds to the same value effectively disables eager end-of-turn behavior because the system waits for final end-of-turn confirmation before starting LLM processing.
The eager_eot_threshold field is controlled by the FE-eager-eot-threshold Portal feature flag. When that flag is disabled, the Portal hides the field, but API payloads can still include it if your account supports the setting.
When using Flux, the Portal may also prompt you to lower the assistant’s start speaking plan timings. Flux works best with low start speaking delays, such as 0.1 seconds for wait time and endpointing plan thresholds.

Keyterm Boost

deepgram/flux and deepgram/nova-3 support keyterm, a comma-separated list of terms to boost during recognition. Use it for product names, customer names, acronyms, or domain-specific vocabulary. Keyterm Boost also supports dynamic variables. Use variables when boosted terms are caller-specific, such as a customer name, participant names, account name, or product names passed into the assistant at conversation start.
"transcription": {
  "model": "deepgram/nova-3",
  "settings": {
    "keyterm": "Telnyx,VoIP,SIP,{{customer_name}},{{product_name}}"
  }
}

Smart Format and Numerals

For Deepgram models other than Flux, the Portal exposes these settings and enables both by default when you select the model:
FieldTypeDefaultDescription
smart_formatbooleantrueAutomatically formats transcripts for readability, including punctuation and casing.
numeralsbooleantrueConverts spoken numbers to digits, for example “five hundred” to “500”.

AssemblyAI settings

assemblyai/universal-streaming supports configurable turn detection. When you select it in the Portal, these defaults are applied:
FieldTypeRangePortal defaultDescription
end_of_turn_confidence_thresholdnumber0-10.4Confidence required to trigger an end of turn. Higher values require more certainty before ending a turn.
min_turn_silenceinteger100-5000400Minimum silence duration, in milliseconds, before a turn can end.
max_turn_silenceinteger100-50001280Maximum silence duration, in milliseconds, before forcing an end of turn.
min_turn_silence must be less than or equal to max_turn_silence.
"transcription": {
  "model": "assemblyai/universal-streaming",
  "language": "auto",
  "settings": {
    "end_of_turn_confidence_threshold": 0.4,
    "min_turn_silence": 400,
    "max_turn_silence": 1280
  }
}

Azure settings

azure/fast supports region selection and an optional Azure API key reference.
FieldTypeDescription
regionstringAzure transcription region. The Portal defaults to latency, which auto-selects the closest supported Telnyx-managed region.
api_key_refstringOptional integration secret reference for your Azure API key. When provided, the Portal only shows regions that support custom API keys.
Common Telnyx-managed regions include latency, australiaeast, centralindia, eastus, northcentralus, westeurope, and westus2. Additional Azure regions are available when using your own Azure API key.
"transcription": {
  "model": "azure/fast",
  "language": "en-US",
  "region": "latency"
}

Configure advanced settings via API

Create an assistant with tuned transcription settings:
curl -X POST https://api.telnyx.com/v2/ai/assistants \
  -H "Authorization: Bearer $TELNYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Low Latency Assistant",
    "model": "anthropic/claude-haiku-4-5",
    "instructions": "You are a helpful voice assistant.",
    "transcription": {
      "model": "deepgram/flux",
      "language": "en",
      "settings": {
        "eot_threshold": 0.8,
        "eot_timeout_ms": 5000,
        "eager_eot_threshold": 0.4,
        "keyterm": "Telnyx,VoIP,SIP,{{customer_name}},{{product_name}}"
      }
    }
  }'