Evaluating RealSpeak Voice PronunciationJuly 29, 2008 at 10:30 pm | Posted in RealSpeak, text to speech | 2 Comments
During the course of adding a pronunciation editor to Text2Go, I’ve discovered some of the strengths and weaknesses of the RealSpeak voices when it comes to pronunciation. Pronunciation errors are quite rare, making it hard to build up a large collection of mispronounced words. Text2Go’s new pronunciation editor makes this very easy.
Now that I’ve identified an extensive list of mispronounced words, it’s possible to spot some trends and discover which voice is the most accurate.
Firstly, I’ve found that compound words can cause problems (e.g. afterword, longterm, screenshot ). Most common compound words are fine but often brand names that are made up of two words run together can be mispronounced. It’s very easy to correct these mispronunciations – you just separate the two words with a hypen or space (e.g. after-word, long term, screen-shot ). This occurs often enough that I’ve added a way to identify compound words in the pronunciation editor. I’ve found that Samantha is significantly better at pronouncing compound words that all the other RealSpeak voices.
A similar problem occurs with words having the prefix re-. For example reprogram, repurposed, rereleased . In these cases the re- is not identified as the re- prefix. Again the solution is simple, just add a hypen after the re (e.g. re-program, re-purposed, re-released ). Once again, Samantha does a better job of pronouncing re- prefixed words.
In order to hear the differences for yourself, I’ve chosen 10 mispronounced words and 4 voices. The table below contains each voice’s attempt to speak the word without any correction applied. Note – I’ve used a dash to indicate a passable but not perfect pronunciation.
The Uncorrected row contains the voice’s uncorrected pronunciation attempt and the Corrected row contains the pronunciation after corrections have been applied. Notice once corrections have been applied, all voices pronounce all words correctly.
One set of results that surprised me were those for Tom. When I started writing up this post I was sure that Samantha was way ahead of the other voices. However this result shows that Tom is also a worthy contender. I’m still sure that Samantha has the most accurate pronunciation but the margin is not as great as I imagined.
My hunch is that Samantha is based on slightly newer technology and it’s the reason why the Samantha voice file is around 110MB in size whereas the others are around 70-90MB.
So does this mean that Samantha is the best voice and the one I should always use? What about regional differences?
Do voices from different regions pronounce words differently? Most definitely! Take the Australian voices Karen and Lee as examples. Not only do they have Australian accents, they correctly pronounce local Australian place names, whereas the other English voices can be way off. Listen to the following Australian place names (of aboriginal origin) spoken first by Samantha (US English) and then Karen (Australian English)
Pronunciation is only one criteria on which to choose a voice. I believe it’s more important to choose a voice you enjoy listening to. If you like the sound of Samantha then definitely choose her but if you prefer the sound of one of the other voices, go with them. Remember that pronunciation errors in normal day to day text are quite rare for any of the RealSpeak voices.
Sorry, the comment form is closed at this time.