Text2Go 3.0 Released – Pronunciation correction done right!August 7, 2008 at 9:19 pm | Posted in text to speech, Text2Go | Comments Off on Text2Go 3.0 Released – Pronunciation correction done right!
Finally! It’s taken a lot longer than I expected. Software estimation proves once again to be an elusive art. The major new feature can be summed up as ‘Pronunciation correction done right’. Ever since I discovered text to speech technology I’ve been bugged by mispronunciations. Although quite rare, they tend to stand out in a document that’s being narrated. They’re especially grating if they occur multiple times in the same document. For this reason, most text to speech applications provide a way to enter corrections. The previous release of Text2Go provided this ability but it required the user to edit XML files and restart Text2Go each time. Not very user-friendly! It was a stop-gap solution until could find the time to implement a proper solution.
That time has come. When I first designed Text2Go I had a lot of ideas on how to efficiently identify and correct mispronunciations. With this release I’ve been able to put these ideas into practice. This has been very satisfying.
One of the first challenges is finding a way of efficiently identifying mispronunciations. Pronunciation errors are actually quite rare. The naive approach is to listen to a document from start to finish, noting down any mispronunciations as you go. You can then come back and enter corrections for the next time the offending words are encountered. There are a couple of major problems with this approach.
The first is that you end up listening to the entire document, complete with mispronunciations. You’ll only get the benefit of the corrections you’ve entered the next time these words occur.
The second problem is the approach is incredibly inefficient. All documents are filled with high frequency words such as ‘a, is, the, and, in’ etc. These are never mispronounced but you have to listen to them over and over.
I wanted an approach that could identify and correct mispronunciations before listening to a document and was quick and efficient. So I came up with the following.
First, extract a list of words from the document and remove all duplicates. This single step means you only have to listen to a word at most once, no matter how many times it appears in the document.
Taking this one step further, once you’ve listened to a word and verified it to be correctly pronounced, it would be nice to be able to remember this so that you never have to check it again. This is particularly useful for eliminating the high frequency words mentioned above. Therefore Text2Go maintains a ‘white-list’ of correctly pronounced words. These are filtered from the document being checked, again significantly reducing the number of words requiring checking.
Of the remaining words, it would be nice to be able to identify the most likely to be mispronounced. The approach I’ve chosen is to spell-check the remaining words. Misspelt (or unrecognized) words are then placed on the top of the list. The reason is that brand names, jargon and slang that haven’t made it into the dictionary are more likely to be mispronounced. Of course correctly spelt words can also be mispronounced and unrecognized words correctly pronounced. It’s just a way of increasing the likelihood of identifying mispronunciations.
Another strategy is to identify compound words (i.e. two words run together) as I’ve discovered these are more likely to be mispronounced. The way I identify compound words is to find all words that are made up of exactly two correctly spelt words. Unfortunately this generates a number of false positives (e.g. ration = rat + ion). It’s still a useful strategy but I could make it more effective if I could find a better way of identifying compound words.
Once you have a list of words you wish to check, Text2Go will speak each word in turn. If you do nothing, the word will be marked as correct. These words can then be added to the ‘white-list’ so they need never be checked again.
If you hear a word that is mispronounced, you can mark it as such with a click of the mouse. Once all words have been spoken, each will be either marked as correct or incorrect. Now all you need to do is enter corrections for each of the mispronounced words. These will then be added to the pronunciation dictionary.
This approach makes it very easy to check just a few words or a large list. You can watch a video of this in action here.
Once you’ve gone to the effort of identifying and correcting the pronunciation of a set of words or even if you’ve just verified a list of words, it would be nice if you could share this information with other Text2Go users. Others will gain the benefits of your corrections and you will gain the benefit of theirs. A win-win situation. This will result in a much larger pronunciation dictionary and in turn lead to more accurate text to speech.
To achieve this I wanted the sharing to require no extra effort on the part of the user. Therefore I’ve created an automatic-update like service that runs every couple of days. It runs completely in the background, requiring no interaction from the user. In fact you can continue to use Text2Go while it runs. First it downloads new pronunciation entries and white-listed words form the Text2Go web server. Then it uploads any corrections and white-listed words you entered locally. These are then merged and made ready for distribution in the next update.
The other major area of functionality I’ve enhanced for this release is Text Cleanup Rules. A Text Cleanup Rule is a power search and replace operation (using regular expressions) that gets applied to a document before it’s converted to text.
One example where Text Cleanup Rules can be useful is in identifying breaks in a document and inserting a pause. For example, a row of ******** or ————- is often used to denote a break in a document. By default these breaks would be pronounced as asterisk, asteriskk, asterisk…. and minus, minus, minus… This very quickly becomes tiresome.
Text2Go includes a rule to identify these breaks and replace them with a pause. A single rule can handle both forms of break and will match two or more *’s or -‘s, with or without spaces in between.
In the previous version of Text2Go you could only create these by editing XML files. For this release I’ve added a built-in editor. The editor allows you to test your rule on a sample block of text as you edit it. Text Cleanup rules are also shared in the same way as pronunciation corrections. You can watch a video of the new editor in action here.
Finally I’ve added a few minor enhancements.
Clipboard Monitor. When you turn on the Clipboard Monitor, Text2Go will automatically add any text copied to the clipboard to the current document. Very convenient when converting text from PDFs, Word documents, email, etc.
Motor-Mouth. Works the same way as the Clipboard Monitor, except that instead of adding text to the current document, it speaks it aloud.
Status Display in the System Tray. In addition to displaying the current Text2Go status on the toolbar in Internet Explorer, it’s also displayed in the icon in the system tray (icon in the bottom right of the screen near the time).
Option to control Whether Text2Go is Started at PC Startup Time. By default Text2Go is started when you boot your PC, but for those who only use Text2Go occasionally, you may prefer not to have it started every time.
This release has been very satisfying to me personally. However I’m afraid that it may have been a little self-indulgent. To ensure this is not the case for the next release, I’m running a 10 Second Poll so you can vote on the next major feature you’d like to see added to Text2Go. Please take the time to vote.
You can download Text2Go 3.0 here.