Text-to-speech synthesis

Exams always seem to present far too much opportunity for distraction… Well, now that they’re past (and, i hope, passed), let me report on some text-to-speech synthesis software with which i’ve been playing.

My initial motivation to play with text-to-speech stuff was simply that, by the end of a long day in front of the PC, and several more hours with the books, i often can’t keep my eyes open any longer, but my brain still wants some food… Add to that the vast corpus of largely unexplored free, public domain “e-books” at Project Gutenberg, and i thought it a good idea to experiment again with some text-to-speech toys.

At the heart of my text-to-speech setup is Festival, a framework being developed by the Centre for Speech Technology Research at the University of Edinburgh for general-purpose, multilingual speech synthesis. After reading the excellent user documentation on it, i’m looking forward to spending some more time tweaking intonation and phrase breaks to make the synthesised speech more natural.

i tried a variety of voices, but i’m most satisfied with the “rab_diphone” voice from the FestVox project of the Carnegie Mellon University’s speech group. This provides a British English male voice.

My desktop environment, KDE, includes kttsmgr, the KDE Text-to-Speech Manager, which provides a nice interface to Festival. kttsmgr offers a simple means of synthesising the contents of the clipboard, a file, etc. It also enables intuitive replacement of words (e.g., abbreviations) which Festival does not otherwise pronounce correctly.

After retrieving them from Project Gutenberg, i run e-books through GutenMark, a tool to create more readable HTML or LaTeX documents from the Project Gutenberg markup. GutenMark does a fair job of marking up headings, direct speech, etc., though i have to make one or two quick global replacements of HTML entities on which kttsmgr/Festival chokes.

i then fire up Konqueror, the lightweight browser included in KDE. Konqueror interfaces directly with kttsmgr, allowing one to view a page or select a block of text and hit “Speak Text”.

So far, i’ve listened to some excellent reads:

The sythesised voice is a bit mechanical, but i found that, after fifteen minutes or so of listening, i was able to follow without much effort. Now i can “read” while messing around the flat, or put on my headphones and enjoy a good bedtime story.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s