The other day I needed to splice some voice samples together for my post on RealSpeak Voice Pronunciation. I was using the free audio editing tool Audacity and happened to notice something disturbing about the waveform that had been generated. I was using the RealSpeak Samantha voice and it was quite clear that a certain amount of audio clipping had occurred.
You can see this in the regions I’ve highlighted in red, where the natural shape of the waveform looks to be cutoff or clipped.
If we zoom right in so the individual waves are visible, you can clearly see that each peak has been chopped off.
Does this matter?
Yes. I’m no audio expert but we’re actually throwing away part of the signal and this will produce some audio distortion.
Can it be fixed?
Yes. The fix is as simple as adjusting the volume of the voice (don’t confuse this with the volume on your PC). You can adjust the volume of an individual voice using the Text2Go Options page. By default, the volume of all voices is set to Normal. By lowering this a couple of notches, the output for Samantha will no longer be clipped.
Converting the same text to speech produced the following waveform.
You can see that the waveform is no longer clipped at the top or the bottom.
Similarly, when we zoom in, each peak is nicely rounded and no longer chopped off.
Do other RealSpeak Voices suffer the same problem?
Serena and Tom also suffer some clipping, so if you use these voices make sure you adjust the volume setting down one or two notches. The other RealSpeak voices are not clipped at the Normal volume setting and don’t need to be adjusted.
Joel Spolsky of Joel On Software and Jeff Atwood of Coding Horror fame were recently discussing ‘why podcast’ in the second episode of their stackoverflow.com podcast (yes the conversation’s already turned meta – it must be a software thing). 🙂
My ears perked up when I heard Joel talking about the benefits and differences of the podcasting medium as oppose to the printed word. His points are the exact reasons why I developed Text2Go. In fact I could almost use it verbatim as marketing material for Text2Go.
You can listen to the relevant excerpt here or visit stackoverflow.com to listen to the entire series of podcasts. Stackoverflow.com is going to be a programmer’s Q&A site done right. Free, no ads, high signal to noise ratio. It’s an ambitious project but if anyone has a chance of pulling it off, these two guys do. Good luck!
To test these assumptions, I dug up some of my recent sales data for computerized voices. When you purchase Text2Go you can also choose to purchase one or more high quality RealSpeak voices. I looked at all purchases that included a single voice. I disregarded any purchase that included both male and female voices and any subsequent voice purchases. I also disregarded all sales of the Indian English female voice Sangeeta as there is no corresponding male Indian English voice.
It’s clear that men have a strong preference for the female voice and it’s what I expected. What is surprising is that women also prefer the female voice, albeit to a much lesser extent. In fact the statistics show woman don’t have a strong preference one way or the other. My wife certainly prefers the male computerized voices, so I’d expected this to be the case for the general population.
Not wanting to let hard data get in the way of assumptions and gut feelings, here are a couple of reasons that may explain the unexpected results for the women.
1. There are characteristics of the female voice that make it easier to produce a more natural sounding computerized voice.
2. We currently sell 7 female voices but only 3 male voices. This extra choice may give some bias to female voices. It may also be an indication that the developers of computerized voices have recognised the popularity of the female voice. Note that the US, UK and Australian accents have both male and female voices.
What’s your preference? Have a listen to the computerized voices on the Text2Go website and let me know.
Today I purchased a new eBook ‘As the Mirror Cracks’by Steve Jordan and I thought I’d share a few tips on converting eBooks from text to speech.
1. Check the DRM permissions. In a perfect world people would trust each other and all eBooks would be DRM free. Thankfully Steve Jordan publishes all his books in multiple formats, none of which have any DRM protection. However the majority of eBooks available for sale are DRM-protected and they will cause you a world of pain. DRM-protected works place all sorts of restrictions on how and where you can view your eBook. When converting an eBook to speech, the DRM protection must allow the text to speech operation. Check very carefully before purchasing the eBook that you are granted this right. If it’s not explicitly stated, assume text to speech has been disabled. Even if the eBook allows text to speech, it will only allow it to be performed from within the authorized eBook reader. If this runs on your PC, then you will only be able to listen to the eBook while sitting at your computer. To use a product such as Text2Go to convert an eBook to an MP3 file that you can listen to on the go, the eBook will need to grant you ‘Copy and Paste’ rights. Most don’t, so it’s best just to say no to DRM-protected works.
2. Don’t convert an eBook in one single chunk or you’ll end up with one enormous track. If you lose your place during playback, it will be very hard to find it again as you will need to seek through an enormous file. Instead I create a playlist for the eBook and then split it up chapter by chapter and store each chapter as a track within the playlist. If I lose my place during playback, it’s easy to find the chapter I was up to and then do a quick seek within the corresponding track.
3. Don’t convert an entire eBook upfront. Instead I convert and listen to the first couple of chapters. This allows me to quickly identify any problem areas during the text to speech process. These may be mispronounced words (most common when the eBook contains a lot of jargon, slang or terminology specific to a particular field), or formatting specific to the eBook (e.g. special characters used to denote pauses, or dividers between sections, chapters, etc). I can then add corrections for the mispronounced words to the pronunciation dictionaries and create text cleanup rulesto handle the eBook’s specific formatting. With these in place I will convert the remaining chapters of the eBook.
4. Don’t use the free Microsoft voices. Listening to an entire eBook with one of these voices will not be a particularly pleasant experience. Instead purchase a high quality, natural-sounding voice.
That’s it. Do you have any tips of your own? Stay tuned for a review of ‘As the Mirror Cracks’.
Today I received an enquiry on how things were progressing and with very little signs of visible progress recently, I’d thought I’d write a quick piece on where things are at.
My current short-term goals are to
- Get a new version of the beta out.
- Release Text2Go onto the market.
The next version of the beta will work with a new set of high quality voices. I am currently working through the contractual arrangements with a voice provider and hope to have everything signed and sealed in about two weeks time. This has been a long and drawn out process and in hindsight I should have started this much earlier in the piece.
One of the benefits of the new voices is that they come with a custom dictionary editor. This allows users to provide guidance to the text to speech engine for any words that it may have trouble correctly pronouncing. This is great for any jargon laiden text that’s full of acronyms, abbreviations and concatenated words (e.g. microISV).
One of the design decisions I’ve been grappling with concerns the trial version of Text2Go. On the one hand I want to strongly encourage prospective users to try Text2Go with a high quality voice. Microsoft Sam, the built-in voice that ships with Windows XP just sounds terrible. However a high quality voice file can be rather large in size (e.g. 30-100Mb). Therefore if I include a high quality voice with the Text2Go trial, users could be faced with a very large download. This is likely to discourage a lot of people from even downloading it. The approach I’ve decided to take is to keep Text2Go as a standalone download which is just under 3Mb. When Text2Go starts up, it will pop up a prompt, recommending that they download a high quality voice for use with the trial. They will be given the option to download the voice, defer the download for another day or never download the voice. I hope this will turn out to be a reasonable compromise.
The other feature I want to include in the next version of the beta is support for Windows Vista. One of the exciting things about Windows Vista on the text to speech front is the inclusion of a new voice, Microsoft Anna. Anna is much better quality than Sam and is even starting to approach the quality of some of the commercial voices. This has a couple of benefits that I can see.
- It will give a lot more people easy access to decent text to speech technology and hopefully raise the profile of text to speech.
- It will put some pressure on the commerical text to speech providers to improve the quality of their desktop voice offerings.
The release of Text2Go onto the market will hopefully follow the beta version quite quickly. I’ve got most of the payment processing and licensing functionality ready to go. There is a bit of work to provide an optional CD shipment but I won’t delay the release if this is not quite set up.
Just over a month ago I posted the first Text2Go beta onto the betanews website. This proved quite successful and I had a number of beta sign ups before Text2Go slipped off the front page. I’ve received some really good feedback, ideas and encouragement. I’m hoping that when I release an updated beta version, I’ll get some more exposure and feedback.
One of the mundane but important tasks that I’ve set up is an offsite backup system for my source code and other resources. For awhile there I had a typical ad hoc process that consisted of burning CD’s, USB key chains and periodically sending a zip archive to my gmail account. I’ve replaced all this with Carbonite, a web based backup system. You just select the folders on your PC that you want to backup and it will automatically backup any changed files to their hosted data centre. The initial backup takes a while, several days in my case for about 1oGb but after that, everything remains backed up pretty much all the time. I like it’s simple design and the fact that all changed files get backed up automatically. It’s one less thing I need to remember to do and it provides great peace of mind.
Looking a bit further into the future, I’m really looking forward to getting started on version 1.1. There are a heap of ideas that I want to implement for the next version.