Smart encoding applies to SMS only. MMS and RCS messages use UTF-8 encoding by default and are not affected.
Why use smart encoding
SMS messages using GSM-7 encoding fit 160 characters per segment. When a message contains even one Unicode character outside GSM-7, the entire message switches to UTF-16 encoding, which only fits 70 characters per segment. A single smart quote (") or em dash (—) can more than double your messaging costs.
Example:
| Message | Encoding | Segments | Cost impact |
|---|---|---|---|
Hello, how are you? (150 chars) | GSM-7 | 1 | Base cost |
Hello, how are you? (150 chars with "smart quotes") | UTF-16 | 3 | 3× cost |
Hello, how are you? (same, smart encoding ON) | GSM-7 | 1 | Base cost |
How it works
When smart encoding is enabled:- Your message text is scanned for Unicode characters that have GSM-7 equivalents.
- Matching characters are automatically replaced (e.g.,
"→",—→-,…→...). - The final encoding (GSM-7 or UTF-16) is determined after all substitutions.
- The segment count is recalculated based on the transformed text.
- The API response includes metadata about the transformation.
Webhooks return the original text. The
text field in delivery webhooks contains your original message, not the smart-encoded version. This ensures your application’s message tracking stays consistent.Enable smart encoding
You can enable smart encoding at two levels: on a messaging profile (applies to all messages) or on a per-request basis.On a messaging profile
Enable smart encoding as a default for all messages sent through a profile.- API
- Portal
Per-request control
Override the profile setting on individual messages using theencoding parameter:
| Value | Behavior |
|---|---|
auto | Follow the profile’s smart_encoding setting (default). |
gsm7 | Force GSM-7 encoding. Smart encoding is applied. Returns 400 if the message contains characters that cannot be converted to GSM-7 (e.g., emoji). |
ucs2 | Force UCS-2 encoding. Skips smart encoding entirely. |
The request-level
encoding parameter takes precedence over the messaging profile’s smart_encoding setting.Response metadata
When smart encoding is applied, the API response includes detailed metadata:| Field | Description |
|---|---|
smart_encoding_applied | Whether any characters were replaced. |
final_encoding | The encoding used after transformation (gsm7 or ucs2). |
segment_count | Number of segments after smart encoding. |
character_count | Message length after transformation. |
replaced_character_count | Number of unique characters that were substituted. |
length_change | Difference in length (positive means message grew, e.g., … → ...). |
The
parts field in the top-level response reflects the segment count after smart encoding, so you always see the actual billing impact.Checking the response
Precedence rules
Smart encoding behavior is determined by a combination of your messaging profile setting and the per-requestencoding parameter:
Profile smart_encoding | Request encoding | Behavior |
|---|---|---|
true | (not set) | Smart encoding applied |
false | (not set) | Smart encoding not applied |
true | auto | Smart encoding applied |
false | auto | Smart encoding not applied |
true or false | gsm7 | Smart encoding applied, must result in GSM-7 or returns 400 |
true or false | ucs2 | Smart encoding skipped, forced UCS-2 |
The request-level
encoding parameter always takes precedence over the messaging profile setting.Character substitutions
Smart encoding replaces 200+ Unicode characters with GSM-7 equivalents. The tables below show all supported substitutions grouped by category.Quotation marks
Quotation marks
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+00AB | « | Left-pointing double angle quotation mark | ” |
| U+00BB | » | Right-pointing double angle quotation mark | ” |
| U+201C | ” | Left double quotation mark | ” |
| U+201D | ” | Right double quotation mark | ” |
| U+02BA | ʺ | Modifier letter double prime | ” |
| U+02EE | ˮ | Modifier letter double apostrophe | ” |
| U+201F | ‟ | Double high-reversed-9 quotation mark | ” |
| U+275D | ❝ | Heavy double turned comma quotation mark ornament | ” |
| U+275E | ❞ | Heavy double comma quotation mark ornament | ” |
| U+301D | 〝 | Reversed double prime quotation mark | ” |
| U+301E | 〞 | Double prime quotation mark | ” |
| U+FF02 | " | Fullwidth quotation mark | ” |
| U+201E | „ | Double low quotation mark | ” |
Apostrophes and single quotes
Apostrophes and single quotes
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+2018 | ’ | Left single quotation mark | ’ |
| U+2019 | ’ | Right single quotation mark | ’ |
| U+02BB | ʻ | Modifier letter turned comma | ’ |
| U+02C8 | ˈ | Modifier letter vertical line | ’ |
| U+02BC | ʼ | Modifier letter apostrophe | ’ |
| U+02BD | ʽ | Modifier letter reversed comma | ’ |
| U+02B9 | ʹ | Modifier letter prime | ’ |
| U+201B | ‛ | Single high-reversed-9 quotation mark | ’ |
| U+FF07 | ' | Fullwidth apostrophe | ’ |
| U+00B4 | ´ | Acute accent | ’ |
| U+02CA | ˊ | Modifier letter acute accent | ’ |
| U+0060 | ` | Grave accent | ’ |
| U+02CB | ˋ | Modifier letter grave accent | ’ |
| U+275B | ❛ | Heavy single turned comma quotation mark ornament | ’ |
| U+275C | ❜ | Heavy single comma quotation mark ornament | ’ |
| U+0313 | ̓ | Combining comma above | ’ |
| U+0314 | ̔ | Combining reversed comma above | ’ |
| U+FE10 | ︐ | Presentation form for vertical comma | ’ |
| U+FE11 | ︑ | Presentation form for vertical ideographic comma | ’ |
Dashes and hyphens
Dashes and hyphens
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+2014 | — | Em dash | - |
| U+2013 | – | En dash | - |
| U+23BC | ⎼ | Horizontal scan line-7 | - |
| U+23BD | ⎽ | Horizontal scan line-9 | - |
| U+2015 | ― | Horizontal bar | - |
| U+FE63 | ﹣ | Small hyphen-minus | - |
| U+FF0D | - | Fullwidth hyphen-minus | - |
| U+2010 | ‐ | Hyphen | - |
| U+2022 | • | Bullet | - |
| U+2043 | ⁃ | Hyphen bullet | - |
Slashes and division
Slashes and division
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+00F7 | ÷ | Division sign | / |
| U+00BC | ¼ | Vulgar fraction one quarter | 1/4 |
| U+00BD | ½ | Vulgar fraction one half | 1/2 |
| U+00BE | ¾ | Vulgar fraction three quarters | 3/4 |
| U+29F8 | ⧸ | Big solidus | / |
| U+0337 | ̷ | Combining short solidus overlay | / |
| U+0338 | ̸ | Combining long solidus overlay | / |
| U+2044 | ⁄ | Fraction slash | / |
| U+2215 | ∕ | Division slash | / |
| U+FF0F | / | Fullwidth solidus | / |
Backslashes
Backslashes
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+29F9 | ⧹ | Big reverse solidus | \ |
| U+29F5 | ⧵ | Reverse solidus operator | \ |
| U+20E5 | Combining reverse solidus overlay | \ | |
| U+FE68 | ﹨ | Small reverse solidus | \ |
| U+FF3C | \ | Fullwidth reverse solidus | \ |
Underscores and vertical lines
Underscores and vertical lines
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+0332 | ̲ | Combining low line | _ |
| U+FF3F | _ | Fullwidth low line | _ |
| U+2017 | ‗ | Double low line | _ |
| U+20D2 | ⃒ | Combining long vertical line overlay | | |
| U+20D3 | ⃓ | Combining short vertical line overlay | | |
| U+2223 | ∣ | Divides | | |
| U+FF5C | | | Fullwidth vertical line | | |
| U+23B8 | ⎸ | Left vertical box line | | |
| U+23B9 | ⎹ | Right vertical box line | | |
| U+23D0 | ⏐ | Vertical line extension | | |
| U+239C | ⎜ | Left parenthesis extension | | |
| U+239F | ⎟ | Right parenthesis extension | | |
Symbols and punctuation
Symbols and punctuation
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+FE6B | ﹫ | Small commercial at sign | @ |
| U+FF20 | @ | Fullwidth commercial at sign | @ |
| U+FE69 | ﹩ | Small dollar sign | $ |
| U+FF04 | $ | Fullwidth dollar sign | $ |
| U+01C3 | ǃ | Latin letter retroflex click | ! |
| U+FE15 | ︕ | Presentation form for vertical exclamation mark | ! |
| U+FE57 | ﹗ | Small exclamation mark | ! |
| U+FF01 | ! | Fullwidth exclamation mark | ! |
| U+203C | ‼ | Double exclamation mark | !! |
| U+FE5F | ﹟ | Small number sign | # |
| U+FF03 | # | Fullwidth number sign | # |
| U+FE6A | ﹪ | Small percent sign | % |
| U+FF05 | % | Fullwidth percent sign | % |
| U+FE60 | ﹠ | Small ampersand | & |
| U+FF06 | & | Fullwidth ampersand | & |
| U+2026 | … | Horizontal ellipsis | … |
Commas
Commas
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+201A | ‚ | Single low-9 quotation mark | , |
| U+0326 | ̦ | Combining comma below | , |
| U+FE50 | ﹐ | Small comma | , |
| U+3001 | 、 | Ideographic comma | , |
| U+FE51 | ﹑ | Small ideographic comma | , |
| U+FF0C | , | Fullwidth comma | , |
| U+FF64 | 、 | Halfwidth ideographic comma | , |
Parentheses and brackets
Parentheses and brackets
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+2768 | ❨ | Medium left parenthesis ornament | ( |
| U+276A | ❪ | Medium flattened left parenthesis ornament | ( |
| U+FE59 | ﹙ | Small left parenthesis | ( |
| U+FF08 | ( | Fullwidth left parenthesis | ( |
| U+27EE | ⟮ | Mathematical left flattened parenthesis | ( |
| U+2985 | ⦅ | Left white parenthesis | ( |
| U+2769 | ❩ | Medium right parenthesis ornament | ) |
| U+276B | ❫ | Medium flattened right parenthesis ornament | ) |
| U+FE5A | ﹚ | Small right parenthesis | ) |
| U+FF09 | ) | Fullwidth right parenthesis | ) |
| U+27EF | ⟯ | Mathematical right flattened parenthesis | ) |
| U+2986 | ⦆ | Right white parenthesis | ) |
| U+2774 | ❴ | Medium left curly bracket ornament | { |
| U+FE5B | ﹛ | Small left curly bracket | { |
| U+FF5B | { | Fullwidth left curly bracket | { |
| U+2775 | ❵ | Medium right curly bracket ornament | } |
| U+FE5C | ﹜ | Small right curly bracket | } |
| U+FF5D | } | Fullwidth right curly bracket | } |
| U+FF3B | [ | Fullwidth left square bracket | [ |
| U+FF3D | ] | Fullwidth right square bracket | ] |
Asterisks
Asterisks
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+204E | ⁎ | Low asterisk | * |
| U+2217 | ∗ | Asterisk operator | * |
| U+229B | ⊛ | Circled asterisk operator | * |
| U+2722 | ✢ | Four teardrop-spoked asterisk | * |
| U+2723 | ✣ | Four balloon-spoked asterisk | * |
| U+2724 | ✤ | Heavy four balloon-spoked asterisk | * |
| U+2725 | ✥ | Four club-spoked asterisk | * |
| U+2731 | ✱ | Heavy asterisk | * |
| U+2732 | ✲ | Open center asterisk | * |
| U+2733 | ✳ | Eight spoked asterisk | * |
| U+273A | ✺ | Sixteen pointed asterisk | * |
| U+273B | ✻ | Teardrop-spoked asterisk | * |
| U+273C | ✼ | Open center teardrop-spoked asterisk | * |
| U+273D | ✽ | Heavy teardrop-spoked asterisk | * |
| U+2743 | ❃ | Heavy teardrop-spoked pinwheel asterisk | * |
| U+2749 | ❉ | Balloon-spoked asterisk | * |
| U+274A | ❊ | Eight teardrop-spoked propeller asterisk | * |
| U+274B | ❋ | Heavy eight teardrop-spoked propeller asterisk | * |
| U+29C6 | ⧆ | Squared asterisk | * |
| U+FE61 | ﹡ | Small asterisk | * |
| U+FF0A | * | Fullwidth asterisk | * |
Math, comparison, periods, and colons
Math, comparison, periods, and colons
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+02D6 | ˖ | Modifier letter plus sign | + |
| U+FE62 | ﹢ | Small plus sign | + |
| U+FF0B | + | Fullwidth plus sign | + |
| U+FE64 | ﹤ | Small less-than sign | < |
| U+FF1C | < | Fullwidth less-than sign | < |
| U+0347 | ͇ | Combining equals sign below | = |
| U+A78A | ꞊ | Modifier letter short equals sign | = |
| U+FE66 | ﹦ | Small equals sign | = |
| U+FF1D | = | Fullwidth equals sign | = |
| U+FE65 | ﹥ | Small greater-than sign | > |
| U+FF1E | > | Fullwidth greater-than sign | > |
| U+2039 | ‹ | Single left-pointing angle quotation mark | < |
| U+203A | › | Single right-pointing angle quotation mark | > |
| U+3002 | 。 | Ideographic full stop | . |
| U+FE52 | ﹒ | Small full stop | . |
| U+FF0E | . | Fullwidth full stop | . |
| U+FF61 | 。 | Halfwidth ideographic full stop | . |
| U+02D0 | ː | Modifier letter triangular colon | : |
| U+02F8 | ˸ | Modifier letter raised colon | : |
| U+2982 | ⦂ | Z notation type colon | : |
| U+A789 | ꞉ | Modifier letter colon | : |
| U+FE13 | ︓ | Presentation form for vertical colon | : |
| U+FF1A | : | Fullwidth colon | : |
| U+204F | ⁏ | Reversed semicolon | ; |
| U+FE14 | ︔ | Presentation form for vertical semicolon | ; |
| U+FE54 | ﹔ | Small semicolon | ; |
| U+FF1B | ; | Fullwidth semicolon | ; |
| U+FE16 | ︖ | Presentation form for vertical question mark | ? |
| U+FE56 | ﹖ | Small question mark | ? |
| U+FF1F | ? | Fullwidth question mark | ? |
Fullwidth digits
Fullwidth digits
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+FF10 | 0 | Fullwidth digit zero | 0 |
| U+FF11 | 1 | Fullwidth digit one | 1 |
| U+FF12 | 2 | Fullwidth digit two | 2 |
| U+FF13 | 3 | Fullwidth digit three | 3 |
| U+FF14 | 4 | Fullwidth digit four | 4 |
| U+FF15 | 5 | Fullwidth digit five | 5 |
| U+FF16 | 6 | Fullwidth digit six | 6 |
| U+FF17 | 7 | Fullwidth digit seven | 7 |
| U+FF18 | 8 | Fullwidth digit eight | 8 |
| U+FF19 | 9 | Fullwidth digit nine | 9 |
Fullwidth and small capital letters
Fullwidth and small capital letters
Fullwidth uppercase (U+FF21–U+FF3A) → A–ZFullwidth lowercase (U+FF41–U+FF5A) → a–zSmall capital letters:
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+1D00 | ᴀ | Latin letter small capital A | A |
| U+0299 | ʙ | Latin letter small capital B | B |
| U+1D04 | ᴄ | Latin letter small capital C | C |
| U+1D05 | ᴅ | Latin letter small capital D | D |
| U+1D07 | ᴇ | Latin letter small capital E | E |
| U+A730 | ꜰ | Latin letter small capital F | F |
| U+0262 | ɢ | Latin letter small capital G | G |
| U+029C | ʜ | Latin letter small capital H | H |
| U+026A | ɪ | Latin letter small capital I | I |
| U+1D0A | ᴊ | Latin letter small capital J | J |
| U+1D0B | ᴋ | Latin letter small capital K | K |
| U+029F | ʟ | Latin letter small capital L | L |
| U+1D0D | ᴍ | Latin letter small capital M | M |
| U+0274 | ɴ | Latin letter small capital N | N |
| U+1D0F | ᴏ | Latin letter small capital O | O |
| U+1D18 | ᴘ | Latin letter small capital P | P |
| U+0280 | ʀ | Latin letter small capital R | R |
| U+A731 | ꜱ | Latin letter small capital S | S |
| U+1D1B | ᴛ | Latin letter small capital T | T |
| U+1D1C | ᴜ | Latin letter small capital U | U |
| U+1D20 | ᴠ | Latin letter small capital V | V |
| U+1D21 | ᴡ | Latin letter small capital W | W |
| U+028F | ʏ | Latin letter small capital Y | Y |
| U+1D22 | ᴢ | Latin letter small capital Z | Z |
Greek letters
Greek letters
Greek capital letters that visually resemble Latin letters are substituted:
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+0391 | Α | Greek capital letter Alpha | A |
| U+0392 | Β | Greek capital letter Beta | B |
| U+0395 | Ε | Greek capital letter Epsilon | E |
| U+0397 | Η | Greek capital letter Eta | H |
| U+0399 | Ι | Greek capital letter Iota | I |
| U+039A | Κ | Greek capital letter Kappa | K |
| U+039C | Μ | Greek capital letter Mu | M |
| U+039D | Ν | Greek capital letter Nu | N |
| U+039F | Ο | Greek capital letter Omicron | O |
| U+03A1 | Ρ | Greek capital letter Rho | P |
| U+03A4 | Τ | Greek capital letter Tau | T |
| U+03A7 | Χ | Greek capital letter Chi | X |
| U+03A5 | Υ | Greek capital letter Upsilon | Y |
| U+0396 | Ζ | Greek capital letter Zeta | Z |
Tildes and circumflex
Tildes and circumflex
| Unicode | Glyph | Description | Replacement |
|---|---|---|---|
| U+02C6 | ˆ | Modifier letter circumflex accent | ^ |
| U+0302 | ̂ | Combining circumflex accent | ^ |
| U+FF3E | ^ | Fullwidth circumflex accent | ^ |
| U+1DCD | ᷍ | Combining double circumflex above | ^ |
| U+02DC | ˜ | Small tilde | ~ |
| U+02F7 | ˷ | Modifier letter low tilde | ~ |
| U+0303 | ̃ | Combining tilde | ~ |
| U+0330 | ̰ | Combining tilde below | ~ |
| U+0334 | ̴ | Combining tilde overlay | ~ |
| U+223C | ∼ | Tilde operator | ~ |
| U+FF5E | ~ | Fullwidth tilde | ~ |
Whitespace characters
Whitespace characters
These characters are replaced with a standard space or removed:
| Unicode | Description | Replacement |
|---|---|---|
| U+00A0 | No-break space | (space) |
| U+2000 | En quad | (space) |
| U+2001 | Em quad | (space) |
| U+2002 | En space | (space) |
| U+2003 | Em space | (space) |
| U+2004 | Three-per-em space | (space) |
| U+2005 | Four-per-em space | (space) |
| U+2006 | Six-per-em space | (space) |
| U+2007 | Figure space | (space) |
| U+2008 | Punctuation space | (space) |
| U+2009 | Thin space | (space) |
| U+200A | Hair space | (space) |
| U+200B | Zero width space | (removed) |
| U+202F | Narrow no-break space | (space) |
| U+205F | Medium mathematical space | (space) |
| U+3000 | Ideographic space | (space) |
| U+FEFF | Zero width no-break space | (removed) |
| U+2028 | Line separator | (removed) |
| U+2029 | Paragraph separator | (removed) |
| U+2060 | Word joiner | (removed) |
Control characters
Control characters
These control characters are removed or transformed:
| Unicode | Description | Replacement |
|---|---|---|
| U+0009 | Tab | 7 spaces |
| U+0000 | Null | (removed) |
| U+0003 | End of text | (removed) |
| U+0004 | End of transmission | (removed) |
| U+0010 | Escape | (removed) |
| U+0011 | Device control one | (removed) |
| U+0012 | Device control two | (removed) |
| U+0013 | Device control three | (removed) |
| U+0014 | Device control four | (removed) |
| U+0017 | End of transmission block | (removed) |
| U+0019 | End of medium | (removed) |
| U+0080 | C1 control codes | (removed) |
| U+008D | Reverse line feed | (removed) |
| U+0090 | Device control string | (removed) |
| U+009B | Control sequence introducer | (removed) |
| U+009F | Application program command | (removed) |
Edge cases
Message length increases
Message length increases
Some substitutions increase message length. For example:
- Horizontal ellipsis (
…) becomes three periods (...) — adds 2 characters - Tab (U+0009) becomes 7 spaces — adds 6 characters
- Vulgar fractions like
½become1/2— adds 2 characters
Mixed replaceable and non-replaceable characters
Mixed replaceable and non-replaceable characters
If your message contains both replaceable Unicode characters and non-replaceable ones (like emojis), smart encoding still applies all possible substitutions. However, the non-replaceable characters will keep the message in UTF-16 encoding.This is still beneficial — fewer Unicode characters means a shorter UTF-16 message and potentially fewer segments.
Extended GSM-7 characters
Extended GSM-7 characters
The characters
~, ^, |, \, {, }, [, ] are part of the GSM-7 extended set and count as 2 characters each when calculating segment length. Smart encoding accounts for this when determining the final segment count.Zero-width characters and empty messages
Zero-width characters and empty messages
Zero-width characters (like U+200B zero-width space) are removed entirely. If your message consists entirely of zero-width or control characters that all get removed, the API returns a
400 error — messages cannot be empty after transformation.encoding=gsm7 with non-convertible characters
encoding=gsm7 with non-convertible characters
If you set
encoding=gsm7 on a request but the message contains characters that cannot be represented in GSM-7 (e.g., emoji), the API returns a 400 error rather than silently dropping characters.Limitations
- SMS only — MMS and RCS use UTF-8 encoding by default and are not affected by smart encoding.
- Not all characters convert — Emojis and non-Latin scripts (e.g., Chinese, Arabic, Cyrillic) have no GSM-7 equivalents and will still trigger UTF-16 encoding.
- Visual differences — Substitutions may slightly alter the appearance of your message. Review the character tables above to understand what changes will occur.
- Length may increase — Some substitutions produce longer output (e.g.,
…→...). Always check the response metadata for the actual segment count.
Related resources
Message Encoding
Learn about GSM-7, UTF-16, and segment calculations.
Send Your First Message
Get started with the Telnyx Messaging API.
Messaging Profiles API
API reference for updating messaging profile settings.
Messages API
API reference for sending messages with encoding options.