CURRENT ISSUE

Contests

bottom corner

Feature Article



Issue #207 October 2007
Embedded Speech
Speech Synthesis for Small Applications

by Nicusor Birsan & Ionut Tarsa

Start | Embedding Speech | Speech Synthesis Techniques | Open-Source Project | System Building Blocks | Sound From Luminary Micro | First Big Porting Problem | Synthesizer | Translator | The LM3S811 Speaks | More Speech Applications | Sources & PDF

FIRST BIG PORTING PROBLEM

In eSpeak, sound is generated based on two principles: sinusoidal for vowels and short waveform files for consonant phonemes. With the formant method, WAV files and transition information are stored in three data files: phondata, phontab, and phonindex. The main problem is the size of those files. We planned to use English for our first attempt, but the phoneme data was too large to fit into the LM3S811’s memory. Also, going down from 22,050 samples per second to 8,000 has some effect, but this affects only the size of the WAV files stored. A deep look into those tables reveals that there are a lot of redundancies. That’s because the eSpeak developer added a lot of diphthongs to make speech generation more natural. A phonetic variation is added if a specific letter is followed or preceded by another one in the text (before and after modifications). This issue presented a new challenge: we had to develop a special tool. As you’ll see, the tool helped a lot.

You might think that porting software to an embedded platform is as simple as copying source files to a microcontroller project. But the files can’t be linked or even compiled. Sometimes you have to modify a lot of source files and settings. A faster approach is to write code for a PC to gain both a well-tested embeddable version and a development tool. The tool is based on the espeakedit program. Source files are available on the SourceForge web site. The new tool, espeakdev, is the same as espeakedit. It offers the same functionality plus some menus and buttons dedicated to generate tables and test the embedded implementation.

As you’ll see, the espeakdev includes espeakedit functionality and some additional source files: mcu_synthdata.c, mcu_wavegen.c, mcu_synthesize.c, and mcu_translate.c. The user interface was modified by adding elements to help us verify the implementation by comparing the speech quality with the full eSpeak (see Photo 2).

Photo 2—After trying to port eSpeak to embedded applications, we developed a new tool. espeakdev lets us test the embedded implementation and adjust it in many ways.

With espeakdev, you can generate C source files with constant arrays. You can also test and modify it with closer feedback. At the same time, you can adjust formants or test prosody because it can be done in the espeakedit program.

In order to keep the espeakedit portable, we built the new tool with wxWidgets in Visual C++. This way, we have both development environments for the PC utility program and mVision for the LM3S811 on the same Windows platform.

The wxWidgets toolkits enable developers to create applications for various platforms, such as Win32, Mac OS X, GTK+, X11, Motif, Windows CE, and more using one codebase. It can be used with languages such as C++, Python, Perl, and C#/.NET. Unlike other cross-platform toolkits, wxWidgets applications look and feel native. This is because wxWidgets use the platform’s own native controls rather than emulate them. It’s also extensive, free, and open-source. Instead of building C files directly from binary data files, it was added to a new class, CompileEmbedded, to generate C files from phoneme source files. This was a good choice because the phoneme tables could be minimized or optimized by any embedded developer with permanent feedback from the desktop speakers.

In the CompileEmbedded class, all of the before and after variations for virtual phonemes were removed. This way, the phoneme tables, indices, and data for English fit into about 30 KB.

From a user’s perspective, the process of C file generation is as simple as selecting a menu item. Then, two files are created in the “data” folder mcu_phondata.c and mcu_phontab.c. The files are included in mcu_synthdata.c, which is a part of the synthesizer kernel source files (see Photo 3).

Photo 3—The espeakdev has the same structure as espeakedit and an additional “mcu” folder.

Previous | Next

 


bottom corner