CURRENT ISSUE

Contests

bottom corner

Feature Article



Issue #207 October 2007
Embedded Speech
Speech Synthesis for Small Applications

by Nicusor Birsan & Ionut Tarsa

Start | Embedding Speech | Speech Synthesis Techniques | Open-Source Project | System Building Blocks | Sound From Luminary Micro | First Big Porting Problem | Synthesizer | Translator | The LM3S811 Speaks | More Speech Applications | Sources & PDF

MORE SPEECH APPLICATIONS

With espeakdev and source files from the SRC directory, speech can be integrated into other applications. However, only the dictionary must be updated to reflect the new product’s vocabulary. To do this, you must modify the m3_dict file from the dictsource folder. The operation must be done manually by translating word by word. Here espeakdev could be improved to generate or automatically update dictionary source files from a given text file.

If you want to make additional modifications, you can write new phonemes, voice, or dictionary files. This is explained in the eSpeak documentation. Getting new tables on C source files is as simple as clicking on the menus in the espeakdev program.

Furthermore, if the formant and WAV files from the phsource directory don’t satisfy the requirements, then the development team could use Praat and espeakdev to record, analyze, and modify new formant data for a given phoneme. But, consider yourself warned that this could be a very time-consuming job. So, it’s time to again thank Mr. Duddington.

Because it is written in a portable language like C, the firmware can be moved to any architecture. Porting the synthesis to another platform is as easy as compiling the source files from the SRC directory. The only specific part is the sound driver, which consists of an interrupt service routine in which the waveform data has to be passed to a PWM or DAC peripheral. Of course, the peripheral must be initialized by another routine at startup. The easiest way is to modify the file sounddrv.c by confining only the output buffer (with its refresh mechanism and update register names) to a specific architecture.

A complex project, such as speech synthesis, could not be built from scratch in a short amount of time because a lot of data was required (phonemes, rules, etc.). A solution for building low-cost embedded speech applications has been around since 1995, and thanks to Jonathan Duddington you can use it now. Because you have an existing project as a starting point, you just have to optimize the code in order to fit it into a smaller embedded processor.

Even if you don’t have a project to start from, it is better to build PC software first (developed in a top-down manner) and then port modules in the firmware from bottom to top. It is easier to edit, build, and run software on the same machine than develop directly for embedded targets. For example, the development of the project was facilitated with espeakdev, which will be a useful tool for future upgrades.

Because the quality of a speech synthesizer is judged by its similarity to the human voice and its ability to be understood, you must test the application. Automated tests aren’t necessary here. Testing the synthesized speech from Windows software is more appropriate because of the faster code-build-test process. This is a requirement here because the naturalness of sine wave synthesis by itself is not so good, but it’s easy to make the output understandable by mixing it with the short waveform files.

One thing must be pointed out. Synthesis quality tests have to be done on multiple subjects. Even if you think the output sounds good, another person might not. Our work could be improved in many ways (e.g., code optimization, the quality of speech, etc). By porting an active open project, we can also include additional features such as multiple voices and languages.

Previous | Next

 


bottom corner