Microsoft Anna is the new English text to speech voice of Windows Vista. She replaces Microsoft Sam, the robot-like voice that comes with Windows XP. Microsoft Sam has probably single-handedly done more than anything to give text to speech a bad name. The only positive thing I can say about Microsoft Sam is that he reminds me of the Commodore Amiga, possibly the first personal computer to have a built-in text to speech capability.
Does Microsoft Anna sound any better than Sam? Does she have a sensual voice that you could listen to all day or does she have you reaching for the mute button? How does she compare to a voice from a leading speech engine provider, such as one of Nuance®’s RealSpeak™ voices?
To answer these questions, I’ve conducted a simple ‘speak-off’ between Microsoft Sam, Microsoft Anna and RealSpeak™ Samantha. Each voice had to read three passages of text, ranging in difficulty from a simple news story to an abstract from a medical journal that most humans would have trouble pronouncing.
The resulting samples can be listened to using the following link.
Microsoft Anna certainly has a more natural and lively voice than Sam, however there are a number of problems. The pacing of the speech is very uneven. Some words and syllables are spoken very quickly and others are quite slow. The intonation also seems to be quite wayward at times, with some words being unnaturally emphasised. Microsoft Sam mispronounced the word ‘co-repressor’, pronouncing it instead as ‘company repressor’, confusing ‘co’ as an abbreviation for ‘company’. Microsoft Anna did not fall into the same trap. However when compared to RealSpeak Samantha, you can hear that Microsoft Anna is a long way off state of the art.
Overall I believe that Microsoft Anna is a small step in the right direction. It’s just a shame that it wasn’t a larger step. She’s a long way off sounding like a sensual babe. At best, she’s could be thought of as a cyborg with a malfunctioning speech unit.
Microsoft Sam provided one little gem when trying to pronounce ‘transcriptional activity’ in the 3rd sample, doing a great impersonation of a cylon from Battlestar Galactica!
I’d like to hear your thoughts on Microsoft Anna.
Robot picture courtesy of peyri.
A text-to-speech product is only as good as the computerized voices it uses, so it’s with great excitement that I can announce that the RealSpeak™ voices from Nuance Communications, Inc have been integrated with Text2Go. RealSpeak™ voices are some of the best available, and support 22 languages and over 30 voices.
A modern computerized voice is a very complex piece of technology. A computerized voice is first created by employing a professional voice actor to record a large set of phrases. These phrases are then sliced and diced into numerous small segments of speech and stored in a database. To give you some idea of how much data is used, a typical RealSpeak™ voice uses around 100MB of hard disk space.
During text to speech playback, the text to speech engine has to perform a number of steps. Firstly it must convert ordinary text into its phonetic representation. This is a complex operation that requires the speech engine to correctly handle special cases such as numbers, dates, addresses, acronyms, punctuation marks and pronuciation ambiguities (e.g read: to read a book, the book had been read).
Finally it must choose the correct speech segments and concatenate them together to produce the desired phonemes. During this process, the speech engine must create the correct intonation in order to produce natural sounding speech. This includes identifying which words to emphasize and the type of sentence (e.g. declarative versus interrogative, WH-questions versus yes/no-questions).
RealSpeak™ voices are not free and must be purchased on a voice by voice basis.
When Text2Go is released for sale, customers will be able to purchase Text2Go by itself or with one or more RealSpeak™ voices. I will be encouraging all customers to purchase at least one RealSpeak™ voice as they are so much clearer and more natural than the free or built in voices that are available. I will be doing this by providing discounted bundles that include one or more RealSpeak™ voices.
To listen to some samples of the RealSpeak™ voices for yourself, visit the Text2Go Voices page. Leave a comment here to let me know what you think of the voices.