The other day I needed to splice some voice samples together for my post on RealSpeak Voice Pronunciation. I was using the free audio editing tool Audacity and happened to notice something disturbing about the waveform that had been generated. I was using the RealSpeak Samantha voice and it was quite clear that a certain amount of audio clipping had occurred.
You can see this in the regions I’ve highlighted in red, where the natural shape of the waveform looks to be cutoff or clipped.
If we zoom right in so the individual waves are visible, you can clearly see that each peak has been chopped off.
Does this matter?
Yes. I’m no audio expert but we’re actually throwing away part of the signal and this will produce some audio distortion.
Can it be fixed?
Yes. The fix is as simple as adjusting the volume of the voice (don’t confuse this with the volume on your PC). You can adjust the volume of an individual voice using the Text2Go Options page. By default, the volume of all voices is set to Normal. By lowering this a couple of notches, the output for Samantha will no longer be clipped.
Converting the same text to speech produced the following waveform.
You can see that the waveform is no longer clipped at the top or the bottom.
Similarly, when we zoom in, each peak is nicely rounded and no longer chopped off.
Do other RealSpeak Voices suffer the same problem?
Serena and Tom also suffer some clipping, so if you use these voices make sure you adjust the volume setting down one or two notches. The other RealSpeak voices are not clipped at the Normal volume setting and don’t need to be adjusted.
During the course of adding a pronunciation editor to Text2Go, I’ve discovered some of the strengths and weaknesses of the RealSpeak voices when it comes to pronunciation. Pronunciation errors are quite rare, making it hard to build up a large collection of mispronounced words. Text2Go’s new pronunciation editor makes this very easy.
Now that I’ve identified an extensive list of mispronounced words, it’s possible to spot some trends and discover which voice is the most accurate.
Firstly, I’ve found that compound words can cause problems (e.g. afterword, longterm, screenshot ). Most common compound words are fine but often brand names that are made up of two words run together can be mispronounced. It’s very easy to correct these mispronunciations – you just separate the two words with a hypen or space (e.g. after-word, long term, screen-shot ). This occurs often enough that I’ve added a way to identify compound words in the pronunciation editor. I’ve found that Samantha is significantly better at pronouncing compound words that all the other RealSpeak voices.
A similar problem occurs with words having the prefix re-. For example reprogram, repurposed, rereleased . In these cases the re- is not identified as the re- prefix. Again the solution is simple, just add a hypen after the re (e.g. re-program, re-purposed, re-released ). Once again, Samantha does a better job of pronouncing re- prefixed words.
In order to hear the differences for yourself, I’ve chosen 10 mispronounced words and 4 voices. The table below contains each voice’s attempt to speak the word without any correction applied. Note – I’ve used a dash to indicate a passable but not perfect pronunciation.
The Uncorrected row contains the voice’s uncorrected pronunciation attempt and the Corrected row contains the pronunciation after corrections have been applied. Notice once corrections have been applied, all voices pronounce all words correctly.
One set of results that surprised me were those for Tom. When I started writing up this post I was sure that Samantha was way ahead of the other voices. However this result shows that Tom is also a worthy contender. I’m still sure that Samantha has the most accurate pronunciation but the margin is not as great as I imagined.
My hunch is that Samantha is based on slightly newer technology and it’s the reason why the Samantha voice file is around 110MB in size whereas the others are around 70-90MB.
So does this mean that Samantha is the best voice and the one I should always use? What about regional differences?
Do voices from different regions pronounce words differently? Most definitely! Take the Australian voices Karen and Lee as examples. Not only do they have Australian accents, they correctly pronounce local Australian place names, whereas the other English voices can be way off. Listen to the following Australian place names (of aboriginal origin) spoken first by Samantha (US English) and then Karen (Australian English)
Pronunciation is only one criteria on which to choose a voice. I believe it’s more important to choose a voice you enjoy listening to. If you like the sound of Samantha then definitely choose her but if you prefer the sound of one of the other voices, go with them. Remember that pronunciation errors in normal day to day text are quite rare for any of the RealSpeak voices.