Speech-to-Text with Voice API and TeXML

Introduction

In this tutorial, we will cover how to get a speech-to-text transcription of your calls using Voice API and TeXML. Before starting, please ensure your Voice API or TeXML application is correctly configured.

Video Tutorial

Learn how to implement real-time Speech-to-Text recognition in your voice applications:

This video shows how to capture and process spoken input from callers using Telnyx’s Speech-to-Text API.

Supported engines

Telnyx offers several speech-to-text engines that can be used to process the audio from the call into a transcription:

Google (default) - Google speech-to-text engine that offers additional features like interim results.
Telnyx - In-house Telnyx speech-to-text engine with significantly better transcription accuracy and lower latency.
Deepgram - Deepgram speech-to-text engine with 3 models (nova-2, nova-3 and flux) that can be set using transcription_model setting.
Azure - Azure speech-to-text engine with a strong support for multiple languages and accents.
xAI - xAI Grok STT engine with the xai/grok-stt model.
AssemblyAI - AssemblyAI Universal-Streaming engine with the assemblyai/universal-streaming model.
Speechmatics - Speechmatics real-time engine with the speechmatics/standard model. High accuracy with multilingual and bilingual language packs.
Soniox - Soniox real-time engine with the soniox/stt-rt-v4 model. Automatic language detection with interim results and endpointing support.

Voice API

The transcription can be enabled for the Voice API calls using a dedicated endpoint in the following way:

Don’t forget to update YOUR_API_KEY here.

curl -i -X POST \
'https://api.telnyx.com/v2/calls/{call_control_id}/actions/transcription_start' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
    "language": "en",
    "client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
    "command_id": "891510ac-f3e4-11e8-af5b-de00688a4901",
    "transcription_engine": "Google/Telnyx/Deepgram/Azure/xAI/AssemblyAI/Speechmatics/Soniox"
}'

The results are sent as a webhook delivered to the webhook defined for the Voice API application:

"data": {
   "record_type": "event",
   "event_type": "call.transcription",
   "id": "0ccc7b54-4df3-4bca-a65a-3da1ecc777f0",
   "occurred_at": "2018-02-02T22:25:27.521992Z",
   "payload": {
        "call_control_id":           "v2:7subYr8fLrXmaAXm8egeAMpoSJ72J3SGPUuome81-hQuaKRf9b7hKA",
        "call_leg_id": "5ca81340-5beb-11eb-ae45-02420a0f8b69",
        "call_session_id": "5ca81eee-5beb-11eb-ba6c-02420a0f8b69",
        "client_state": null,
        "connection_id": "1240401930086254526",
        "transcription_data": {
        "confidence": 0.977219,
            "is_final": true,
            "transcript": "hello this is a test speech"
         }
    }
}

TeXML

You can enable transcription on your TeXML calls by including a <Transcription> verb in the TeXML instructions:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Start>
    <Transcription language="en" transcriptionCallback="/transcription" transcriptionEngine=”Telnyx” />
  </Start>
</Response>

The transcription results are sent in the callback in the following format:

%{
    "AccountSid" : "6d547b4f-993a-4e87-b95c-2d9460b3824b",
    "CallSid" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
    "CallSidLegacy" : "v3:xIscDTsILHoErg5d4BfFWITg7vHmTvTRm-4YEeOgrwESDQsDWQNxvw",
    "Confidence" : "0.9822598695755005",
    "ConnectionId" : "1614262910593271041",
    "From" : "+18727726007",
    "IsFinal" : "true",
    "To" : "+48664087895",
    "Transcript" : "let's hear some music"
}

​Introduction

​Video Tutorial

​Supported engines

​Voice API

​TeXML

Introduction

Video Tutorial

Supported engines

Voice API

TeXML