> ## Documentation Index
> Fetch the complete documentation index at: https://developers.telnyx.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Text-to-Speech WebSocket messages

> Reference for WebSocket frame types — client-to-server text messages and server-to-client audio frames — used in the Telnyx Text-to-Speech streaming API.

## Client → Server

All client messages are JSON text frames.

### Text Frame

<ParamField body="text" type="string" required>
  Text to synthesize. `" "` (single space) for handshake. `""` (empty string) for end-of-sequence.
</ParamField>

<ParamField body="voice_settings" type="object">
  Provider-specific voice configuration. Only used in the handshake frame (`{"text": " "}`). See [Voice Settings](/docs/voice/tts/websocket-streaming/parameters/voice-settings).
</ParamField>

<ParamField body="flush" type="boolean">
  When `true`, immediately synthesizes all buffered text without waiting for a sentence boundary. Default: `false`.
</ParamField>

<ParamField body="force" type="boolean">
  When `true`, stops the current synthesis worker and starts a new one. The original handshake is replayed automatically. Use for barge-in/interruption.
</ParamField>

### Message Sequence

**1. Handshake** (required first message):

```json theme={null}
{"text": " "}
```

With optional voice settings:

```json theme={null}
{
  "text": " ",
  "voice_settings": {
    "voice_speed": 1.2
  }
}
```

**2. Text** (one or more):

```json theme={null}
{"text": "Hello, welcome to Telnyx."}
```

**3. Flush** (optional — force synthesis of buffered partial sentences):

```json theme={null}
{"text": "incomplete fragment", "flush": true}
```

**4. Interrupt** (optional — restart synthesis):

```json theme={null}
{"force": true}
```

**5. End of sequence**:

```json theme={null}
{"text": ""}
```

***

## Server → Client

All server messages are JSON text frames.

### Audio Chunk

Returned when synthesis produces audio for a complete sentence.

```json theme={null}
{
  "audio": "<base64-encoded-audio>",
  "text": "Hello, welcome to Telnyx.",
  "isFinal": false,
  "cached": false,
  "timeToFirstAudioFrameMs": 245
}
```

<ParamField body="audio" type="string | null">
  Base64-encoded audio data. `null` when the provider uses streamed delivery — audio arrives in separate streamed chunk frames instead. See note below.
</ParamField>

<ParamField body="text" type="string | null">
  The text segment this audio corresponds to. `null` for streamed audio chunks.
</ParamField>

<ParamField body="isFinal" type="boolean">
  `false` for audio chunks.
</ParamField>

<ParamField body="cached" type="boolean">
  `true` if audio was served from cache.
</ParamField>

<ParamField body="timeToFirstAudioFrameMs" type="integer">
  Time in milliseconds from speech request to first audio frame. Only present on the first chunk of each synthesis.
</ParamField>

### Streamed Audio Chunk

For providers that stream audio incrementally (Telnyx Natural, NaturalHD, Qwen3TTS, Rime, Minimax, Resemble, Inworld), audio arrives in separate frames:

```json theme={null}
{
  "audio": "<base64-encoded-audio>",
  "text": null,
  "isFinal": false,
  "cached": false
}
```

These contain raw audio data (`text` is always `null`). The concatenated audio chunk for these providers has `audio: null` — only the streamed chunks carry audio bytes.

<Info>
  For AWS Polly and Azure, audio is returned in the `audio` field of the regular audio chunk frame. For all other providers, ignore the `audio` field on the text-bearing chunk and collect audio from the streamed frames.
</Info>

### Final Frame

Signals that synthesis is complete for the current text input:

```json theme={null}
{
  "audio": null,
  "text": "",
  "isFinal": true
}
```

The connection remains open after a final frame — send more text or close.

### Error Frame

```json theme={null}
{
  "error": "Provider error message"
}
```

The connection closes shortly after an error frame.
