Text to Speech vs Prerecorded Messages in IVR
Back to Articles
API IVR Call Centers Text to Speech TTS

Text to Speech vs Prerecorded Messages in IVR

March 18, 2019 3 min
Aivis Olsteins

Aivis Olsteins

Text to Speech and prerecorded messages are two possible ways to provide Voice Response - an essential part of IVR. Each of these approaches have their pros and cons.

Prerecorded messages - a traditional way of building an IVR. Operator would lay out the plan for IVR, compose a list of needed words and phrases, and either require a professional studio to record them, or do it at their own premises.

Text to Speech. While there are multiple good standalone Text to Speech applications available, the true power of TTS comes from large cloud based platforms like Amazon AWS, Google GCP and similar. With their access to very large data sets they are able to create speech synthesis which is very close to live person's speech. They offer very easy to implement and feature-rich API's which operators can get working in a very short time.

Pros of Text-to-speech:

  1. Quick to deploy. It is usually a matter of ready available SDK from the TTS vendor which needs to be downloaded, configured and called in few lines of code. Indeed, we have tested some of the libraries in popular programming languages and it was possible to get your first text-to-speech synthesized message within half an hour for a non-expert. in contrast, to prerecord all possible messages, it takes a lot of work.
  2. Takes care of compound numbers. Suppose, you need to be able to speak multi-digit amounts, like money. In case of TTS it usually works automatically, with no special configuration. If you want to use prerecorded messages, you have to take care of number composition yourself. For example, to say $1259.99 you need to properly compose a list of sounds like: one, thousand, two, hundred, fifty, nine, dollars, ninety, nine, cents. It requires some programming logic. And things become really complicated when you need it in different languages: in German, for example ones come before tens when you speak, so 82 is pronounced zwei and achtzig (two and eighty). Many other languages have their tricks too. 
  3. No need to have recordings for all possible use cases. For IVR systems where all possible words and phrases are not known in advance, TTS is the only choice.

Pros of prerecorded messages:

  1. No running costs. This is true if platform-API based TTS are used. They are typically billed by word or character spoken. The costs, however, can be reduced by using caching, i.e. storing locally repeated words and phrases.
  2. More languages. The list of languages of TTS systems are typically limited to most popular languages only. For example, Amazon Polly is available in 19 languages only (not counting variants and dialects). For Google Speech Synthesis this number is slightly higher, and with more dialects. But most of the worlds smallest languages are not covered.
  3. More user friendly. Synthesized sounds sometimes sound less human, and may be less attractive to the listener.
  4. Can speak special words like company names etc. Text to Speech may not always pronounce correctly some rarely used words, like company names, foreign-origin names etc.
  5. Availability. Since most of TTS are platform based, the network connection availability might be an issue.

To sum up, each of these approaches have their pros and cons. One should choose IVR system which can handle both: TTS and Prerecorded messages so each can be used in their appropriate case. have a look at our IVR Builder which supports both cases out of the box.

Share this article

Aivis Olsteins

Aivis Olsteins

An experienced telecommunications professional with expertise in network architecture, cloud communications, and emerging technologies. Passionate about helping businesses leverage modern telecom solutions to drive growth and innovation.

Related Articles

The Commitment Economy: Why Voice AI Bookings Must Be Integrated, Not Just Conversational

The Commitment Economy: Why Voice AI Bookings Must Be Integrated, Not Just Conversational

AI can promise a booking, but what about the broken promise? Learn why systemic integration, Accuracy Rate, and System Sync define the real test of Voice AI reliability

Read Article
Beyond the Dial Tone: 3 Metrics That Define Outbound AI Success

Beyond the Dial Tone: 3 Metrics That Define Outbound AI Success

Outbound AI requires a new scorecard. Learn the 3 metrics (Connection Rate, Engagement Quality, and Conversion Impact) that measure pipeline movement, not just call volume

Read Article
The New AI Scorecard: How to Measure Campaign Effectiveness Beyond "Call Volume"

The New AI Scorecard: How to Measure Campaign Effectiveness Beyond "Call Volume"

Stop guessing with 'Call Volume'. Discover the 3-Layer Framework for measuring Voice AI success: Goal Completion Rate (GCR), Sentiment Drift, and Knowledge Retrieval. Turn phone calls into structured marketing data

Read Article
What Happens to Metrics When "Hold Time" Hits Zero?

What Happens to Metrics When "Hold Time" Hits Zero?

Does Voice AI just save money? No. Discover the "CSAT Paradox" and how zero hold time improves revenue, lead capture, and team morale simultaneously.

Read Article

SUBSCRIBE TO OUR NEWSLETTER

Stay up to date with the latest news and updates from our telecom experts