CURRENT ISSUE

Contests

bottom corner

Feature Article



Issue #207 October 2007
Embedded Speech
Speech Synthesis for Small Applications

by Nicusor Birsan & Ionut Tarsa

Start | Embedding Speech | Speech Synthesis Techniques | Open-Source Project | System Building Blocks | Sound From Luminary Micro | First Big Porting Problem | Synthesizer | Translator | The LM3S811 Speaks | More Speech Applications | Sources & PDF

SYSTEM BUILDING BLOCKS

Because our job was “simple”—we were porting an existing project to a microcontroller—we kept the skeleton of the eSpeak project and designed the modules from the bottom up (i.e., the translator, prosody and intonation, the synthesizer, the wave generator, and the sound driver).

As you can see in Figure 1, the input text is analyzed and then split into phrases, words, and numbers. Each word is translated into a list of phonemes based on a dictionary with a limited vocabulary. While the prosody is analyzed, the phoneme list is filled with pitch, pitch variations, amplitude, and transitions between phonemes and pauses. After each phrase or sentence is translated, the phoneme list is passed to the low-level synthesizer, the digital signal processor (DSP). Based on the phoneme list, the concatenative synthesizer puts simple commands into a queue from where they are processed by the wave generator. Those commands could be pointers to waveforms or formant data from the “Phoneme data” table, amplitude or pitch settings, and so on.

Figure 1—The block diagram contains mostly software parts, stored data in internal flash memory, and a PWM peripheral.

We had some trouble with the stored data. The eSpeak sounds good on the desktop speakers, but a 360-KB executable and hundreds of kilobytes of data files discouraged us. How do you port more than 0.5 MB of code and data into 64 KB of ROM and 8 KB of SRAM? The answer is simple: don’t port all of it. Port only the program kernel and the necessary data. Other features may be added as they are needed. For instance, why embed all of the languages? For a Spanish application, you can include only the necessary files for speaking that language.

Before going further, let’s cover the topic of porting eSpeak to a microcontroller. We moved the application from C++ to C and kept as much as possible from the original project. Next, we reduced the structure sizes for a better fit with embedded applications and optimized the implementation for greater efficiency.



Previous | Next

 


bottom corner