Microsoft Anna – Sensual Babe or Just Another Robot?

July 22, 2007

Microsoft Anna is the new English text to speech voice of Windows Vista. She replaces Microsoft Sam, the robot-like voice that comes with Windows XP. Microsoft Sam has probably single-handedly done more than anything to give text to speech a bad name. The only positive thing I can say about Microsoft Sam is that he reminds me of the Commodore Amiga, possibly the first personal computer to have a built-in text to speech capability.

Does Microsoft Anna sound any better than Sam? Does she have a sensual voice that you could listen to all day or does she have you reaching for the mute button? How does she compare to a voice from a leading speech engine provider, such as one of Nuance®’s RealSpeak™ voices?

To answer these questions, I’ve conducted a simple ‘speak-off’ between Microsoft Sam, Microsoft Anna and RealSpeak™ Samantha. Each voice had to read three passages of text, ranging in difficulty from a simple news story to an abstract from a medical journal that most humans would have trouble pronouncing.

The resulting samples can be listened to using the following link.

Microsoft Sam

Microsoft Anna

RealSpeak Samantha

Sample Text

Sample 1

Click to listen to Microsoft Sam Sample 1 Click to listen to Microsoft Anna Sample 1 Click to listen to RealSpeak Samantha Sample 1 SCIENTISTS have explained mathematically why the famous silly walks of Monty Python’s John Cleese have never caught on in the long history of homo sapiens. The giant, leg-twirling strides of silly walks may enable an individual to leap around swiftly but are simply too expensive in metabolic energy compared with conventional locomotion, according to a paper published by Britain’s Royal Society.

Sample 2

Click to listen to Microsoft Sam Sample 2 Click to listen to Microsoft Anna Sample 2 Click to listen to RealSpeak Samantha Sample 2 The color of a vegetable means something more than what it looks like. The colors of vegetables are affected by the vitamins and minerals contained therein. Having a literally colourful diet is important for your health. A blueberry and a purple carrot both provide your body with antioxidants, blueberries are in anthocyanin pigments which are powerful antioxidants. And carrots contain carotenoid pigments, the darker the colour of a carrot means that this carrot contains a higher concentration of carotenoids than say, an orange carrot.

Sample 3

Click to listen to Microsoft Sam Sample 3 Click to listen to Microsoft Anna Sample 3 Click to listen to RealSpeak Samantha Sample 3 Similarities in physiological roles of LXR (liver X receptors) and co-repressor RIP140 (receptor-interacting protein 140) in regulating energy homoeostasis and lipid and glucose metabolism suggest that the effects of LXR could at least partly be mediated by recruitment of the co-repressor RIP140. In the present study, we have elucidated the molecular basis for regulation of LXR transcriptional activity by RIP140.

The verdict?

Microsoft Anna certainly has a more natural and lively voice than Sam, however there are a number of problems. The pacing of the speech is very uneven. Some words and syllables are spoken very quickly and others are quite slow. The intonation also seems to be quite wayward at times, with some words being unnaturally emphasised. Microsoft Sam mispronounced the word ‘co-repressor’, pronouncing it instead as ‘company repressor’, confusing ‘co’ as an abbreviation for ‘company’. Microsoft Anna did not fall into the same trap. However when compared to RealSpeak Samantha, you can hear that Microsoft Anna is a long way off state of the art.

Overall I believe that Microsoft Anna is a small step in the right direction. It’s just a shame that it wasn’t a larger step. She’s a long way off sounding like a sensual babe. At best, she’s could be thought of as a cyborg with a malfunctioning speech unit.

Microsoft Sam provided one little gem when trying to pronounce ‘transcriptional activity’ in the 3rd sample, doing a great impersonation of a cylon from Battlestar Galactica!

I’d like to hear your thoughts on Microsoft Anna.

Robot picture courtesy of peyri.


Text2Go Gets a New Voice or 30

July 22, 2007

Nuance Commincations, IncA text-to-speech product is only as good as the computerized voices it uses, so it’s with great excitement that I can announce that the RealSpeak™ voices from Nuance Communications, Inc have been integrated with Text2Go. RealSpeak™ voices are some of the best available, and support 22 languages and over 30 voices.

A modern computerized voice is a very complex piece of technology. A computerized voice is first created by employing a professional voice actor to record a large set of phrases. These phrases are then sliced and diced into numerous small segments of speech and stored in a database. To give you some idea of how much data is used, a typical RealSpeak™ voice uses around 100MB of hard disk space.

During text to speech playback, the text to speech engine has to perform a number of steps. Firstly it must convert ordinary text into its phonetic representation. This is a complex operation that requires the speech engine to correctly handle special cases such as numbers, dates, addresses, acronyms, punctuation marks and pronuciation ambiguities (e.g read: to read a book, the book had been read).

Finally it must choose the correct speech segments and concatenate them together to produce the desired phonemes. During this process, the speech engine must create the correct intonation in order to produce natural sounding speech. This includes identifying which words to emphasize and the type of sentence (e.g. declarative versus interrogative, WH-questions versus yes/no-questions).

RealSpeak™ voices are not free and must be purchased on a voice by voice basis.

When Text2Go is released for sale, customers will be able to purchase Text2Go by itself or with one or more RealSpeak™ voices. I will be encouraging all customers to purchase at least one RealSpeak™ voice as they are so much clearer and more natural than the free or built in voices that are available. I will be doing this by providing discounted bundles that include one or more RealSpeak™ voices.

To listen to some samples of the RealSpeak™ voices for yourself, visit the Text2Go Voices page. Leave a comment here to let me know what you think of the voices.

