Message Encoding

SMS messages are encoded into segments of 140 bytes each. You are billed per segment, so understanding encoding is key to controlling costs. The encoding determines how many characters fit in each segment:

Encoding	Bits per char	Single segment	Multi-part segment
GSM 7-bit	7	160 chars	153 chars
ASCII 7-bit	7	160 chars	153 chars
ASCII 8-bit	8	140 chars	134 chars
UTF-16	16	70 chars	67 chars

A single non-GSM-7 character (like an emoji or curly quote) switches the entire message to UTF-16, cutting capacity from 160 to 70 characters per segment. This can more than double your costs.

Segment calculator

Use this interactive tool to check how your message will be encoded and segmented:

How segments work

Every SMS message is transmitted in units of 140 bytes. When a message exceeds one segment, a 6-byte header (User Data Header, or UDH) is added to each segment for reassembly, reducing the usable space.

Single segment:   140 bytes available → 160 GSM-7 chars or 70 UTF-16 chars
Multi-part:       134 bytes per segment → 153 GSM-7 chars or 67 UTF-16 chars
Maximum:          10 segments per message

Segment calculation formula

To calculate the number of segments for a message:

GSM-7
UTF-16

Characters ≤ 160  →  1 segment
Characters > 160  →  ⌈characters / 153⌉ segments

Examples:
- 100 chars = 1 segment
- 160 chars = 1 segment
- 161 chars = 2 segments (153 + 8)
- 306 chars = 2 segments (153 + 153)
- 307 chars = 3 segments (153 + 153 + 1)
- 1530 chars = 10 segments (maximum)

Characters ≤ 70   →  1 segment
Characters > 70   →  ⌈characters / 67⌉ segments

Examples:
- 50 chars = 1 segment
- 70 chars = 1 segment
- 71 chars = 2 segments (67 + 4)
- 134 chars = 2 segments (67 + 67)
- 135 chars = 3 segments (67 + 67 + 1)
- 670 chars = 10 segments (maximum)

Cost impact example

Consider a 200-character message:

Scenario	Encoding	Segments	Relative cost
All GSM-7 characters	GSM-7	2	2×
Contains one emoji 😀	UTF-16	3	3×
Contains one curly quote “	UTF-16	3	3×
With smart encoding enabled	GSM-7	2	2×

Enable smart encoding to automatically replace common Unicode characters (like curly quotes and em dashes) with GSM-7 equivalents, reducing segment counts.

Encoding by sender type

Sender type	Default encoding	Fallback
Long Code	GSM 7-bit	UTF-16
Toll-Free	GSM 7-bit	UTF-16
Short Code	ASCII 7-bit	UTF-16
Alphanumeric	GSM 7-bit	UTF-16

If your message contains characters outside the default encoding’s character set, the fallback encoding is used automatically for the entire message.

MMS and RCS messages use UTF-8 encoding by default and are not affected by these limits.

GSM 7-bit character set

Telnyx uses a GSM 7-bit encoding optimized for maximum carrier compatibility. Only characters in this set will keep your message in the efficient GSM-7 encoding.

Standard characters (1 character each)

Letters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z

Digits:

0 1 2 3 4 5 6 7 8 9

Symbols and punctuation:

! " # $ % & ' ( ) * + , - . / : ; < = > ? @

Special characters:

Character	Description
`space`	Space
`\n`	Line feed
`\r`	Carriage return
`_`	Underscore
`£`	Pound sign
`¥`	Yen sign
`è`	e grave
`é`	e acute
`ù`	u grave
`ì`	i grave
`ò`	o grave
`Ø`	O with stroke
`ø`	o with stroke
`Å`	A with ring
`å`	a with ring
`Æ`	AE ligature
`æ`	ae ligature
`ß`	Sharp s
`É`	E acute
`¡`	Inverted exclamation
`Ä`	A umlaut
`Ö`	O umlaut
`Ñ`	N tilde
`Ü`	U umlaut
`§`	Section sign
`¿`	Inverted question
`ä`	a umlaut
`ö`	o umlaut
`ñ`	n tilde
`ü`	u umlaut
`à`	a grave

Extended characters (2 characters each)

These characters require an escape sequence and count as 2 characters in segment calculations:

Character	Description	Character count
`~`	Tilde	2
`^`	Circumflex	2
`\|`	Pipe / vertical bar	2
`\`	Backslash	2
`{`	Left curly bracket	2
`}`	Right curly bracket	2
`[`	Left square bracket	2
`]`	Right square bracket	2
`€`	Euro sign	2

Extended characters are easy to overlook when estimating segment counts. A message with 155 standard characters and 3 pipe characters (|) uses 155 + (3 × 2) = 161 character slots, requiring 2 segments instead of 1.

Detecting encoding in your application

Before sending, you can check if a message will use GSM-7 or UTF-16 encoding to estimate costs. Here are helper functions for each language:

import re

# GSM-7 basic character set
GSM7_BASIC = set(
    "@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ ÆæßÉ"
    " !\"#¤%&'()*+,-./0123456789:;<=>?"
    "¡ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    "ÄÖÑܧ¿abcdefghijklmnopqrstuvwxyz"
    "äöñüà"
)
GSM7_EXTENDED = set("^{}\\[~]|€")

def calculate_segments(text: str) -> dict:
    """Calculate encoding and segment count for an SMS message."""
    is_gsm7 = all(c in GSM7_BASIC or c in GSM7_EXTENDED for c in text)

    if is_gsm7:
        # Count extended chars as 2
        char_count = sum(2 if c in GSM7_EXTENDED else 1 for c in text)
        if char_count <= 160:
            segments = 1
        else:
            segments = -(-char_count // 153)  # ceiling division
        return {"encoding": "GSM-7", "char_count": char_count, "segments": segments}
    else:
        # UTF-16: emojis count as 2 chars (surrogate pairs)
        char_count = 0
        for c in text:
            char_count += 2 if ord(c) > 0xFFFF else 1
        if char_count <= 70:
            segments = 1
        else:
            segments = -(-char_count // 67)
        return {"encoding": "UTF-16", "char_count": char_count, "segments": segments}

# Example usage
result = calculate_segments("Hello, world!")
print(f"Encoding: {result['encoding']}, Segments: {result['segments']}")
# Output: Encoding: GSM-7, Segments: 1

result = calculate_segments("Hello 😀")
print(f"Encoding: {result['encoding']}, Segments: {result['segments']}")
# Output: Encoding: UTF-16, Segments: 1

Common encoding issues

Message unexpectedly uses UTF-16 (too many segments)

Symptom: Your message uses more segments than expected.Cause: A non-GSM-7 character is present, forcing the entire message to UTF-16. Common culprits:

Character	Source	GSM-7?
`"` `"` (curly quotes)	Word processors, mobile keyboards	❌
`'` `'` (curly apostrophes)	Auto-correct, CMS platforms	❌
`—` (em dash)	Word processors	❌
`…` (ellipsis)	Mobile keyboards	❌
`€` (euro sign)	Manual entry	✅ (extended, costs 2 chars)

Fix:

Enable smart encoding to auto-replace these characters
Or manually replace them with GSM-7 equivalents before sending

Emojis dramatically increase segment count

Symptom: Adding a single emoji doubles or triples the number of segments.Cause: Emojis force UTF-16 encoding (70 chars/segment instead of 160). Additionally, most emojis use surrogate pairs and count as 2 UTF-16 characters.Example:

"Thanks for your order!"        → GSM-7, 1 segment (22 chars)
"Thanks for your order! 🎉"     → UTF-16, 1 segment (25 chars)
"Thanks for your order! ... 🎉" → UTF-16, 2 segments (71+ chars)

Fix: If cost is a concern, avoid emojis in SMS. Use emojis freely in MMS/RCS where encoding isn’t a factor.

Extended GSM-7 characters cause unexpected segment splits

Symptom: A 155-character message that looks like it should fit in one segment actually requires two.Cause: Characters like [, ], {, }, |, \, ^, ~, and € are in the GSM-7 extended set and count as 2 characters each.Example:

"Price: $100 [USD]" → 18 visible chars but 20 GSM-7 chars ([ and ] each cost 2)

Fix: Account for extended characters when calculating message length. Use the segment calculator above or the SDK helpers in this guide.

Copy-pasted text from Word/Google Docs causes issues

Symptom: Text that looks like normal ASCII actually contains Unicode characters.Cause: Word processors often replace straight quotes with curly quotes, hyphens with em dashes, and three periods with an ellipsis character. These are invisible differences that force UTF-16.Fix:

Enable smart encoding — this handles the most common substitutions automatically
Sanitize text before sending by replacing known problem characters
Use the encoding parameter set to gsm7 to get a 400 error if non-GSM-7 characters are present (fail-fast approach)

Messages truncated or split incorrectly on recipient's phone

Symptom: The recipient sees a message split in unexpected places, or parts arrive out of order.Cause: Multi-part messages are reassembled by the recipient’s device using the UDH (User Data Header). Some older devices or carriers may not support reassembly for messages over a certain number of segments.Fix:

Keep messages under 3-4 segments for maximum compatibility
Telnyx supports up to 10 segments, but recipient device support varies
Consider using MMS for longer content

Non-Latin scripts (Chinese, Arabic, Cyrillic) use too many segments

Symptom: Messages in non-Latin scripts use significantly more segments than English messages of similar visible length.Cause: Non-Latin characters have no GSM-7 equivalents, so the entire message uses UTF-16 encoding (70 characters per segment). Smart encoding cannot help here.Fix:

This is expected behavior — plan for higher segment counts when messaging in non-Latin scripts
Keep messages concise
Consider MMS for longer non-Latin content

Best practices

Enable smart encoding

Turn on smart encoding on your messaging profile to automatically handle Unicode-to-GSM-7 substitutions. This is the single biggest cost-saving measure.

Validate before sending

Use the encoding detection helpers above to check segment counts before sending. Alert your application when messages will be unexpectedly expensive.

Sanitize input text

If you accept user-generated content, sanitize it before sending. Strip or replace invisible Unicode characters, curly quotes, and other common problem characters.

Keep messages concise

Stay under 160 characters (GSM-7) or 70 characters (UTF-16) to avoid multi-part message overhead. Each additional segment adds 7 characters of UDH overhead.

Use the right channel

For messages that need emojis, rich formatting, or non-Latin scripts, consider MMS or RCS instead of SMS.

Smart Encoding

Automatically replace Unicode characters with GSM-7 equivalents to reduce costs.

Send Your First Message

Get started with the Telnyx Messaging API.

Messages API Reference

API reference for sending messages with encoding options.

Messaging Profiles

Configure smart encoding and other profile settings.

​Segment calculator

​How segments work

​Segment calculation formula

​Cost impact example

​Encoding by sender type

​GSM 7-bit character set

​Detecting encoding in your application

​Common encoding issues

​Best practices

​Related resources

Smart Encoding

Send Your First Message

Messages API Reference

Messaging Profiles

Segment calculator

How segments work

Segment calculation formula

Cost impact example

Encoding by sender type

GSM 7-bit character set

Detecting encoding in your application

Common encoding issues

Best practices

Related resources