CURRENT ISSUE
Contests
Feature Article
|
|
Issue #207 October 2007
Embedded Speech
Speech Synthesis for Small Applications
by Nicusor Birsan & Ionut Tarsa
Start | Embedding Speech | Speech Synthesis Techniques | Open-Source Project | System Building Blocks | Sound From Luminary Micro | First Big Porting Problem | Synthesizer | Translator | The LM3S811 Speaks | More Speech Applications | Sources & PDF
TRANSLATOR
The translator’s basic function is to transpose text to be spoken into phonemes. Generally, this is done based on language rules and a dictionary of phonemes. The two approaches have advantages that have been discussed in many papers and books. Here is the problem. English has so many rules that it could not fit into the LM3S811’s flash memory. And those rules can’t cover all the possibilities. A long list of exception words must be kept in a dictionary.
There is only one solution for this. Because our small microcontroller must speak a limited number of words, we used a dictionary with only the vocabulary needed for our application.
Now is the time for espeakdev to reenter the scene. First, we have to make a new voice just for testing the new dictionary on the desktop (voices/m3). In this file, we can set the language name to whatever we want, so let’s choose “m3” as the name of our application-specific language because we are testing on a Cortex-M3 device. Second, we must have a compiled dictionary in the espeak-data folder whose filename must start with the same prefix (m3_dict). The compiled dictionaries are binary indexed versions of two text files that must be edited and saved into the dictsource folder: m3_list and m3_rules. The m3_list file must contain the phonetic translations of the words from the application vocabulary. The file m3_rules must exist, but since we don’t have any rules for our language, it is empty. In order to use our new language, we must compile the dictionary from the espeakdev application’s menu (Data-Compile dictionary). Now we can make espeakdev speak the new language by selecting the m3 voice (on the speak-voice menu) to see how it sounds. A new compiled m3_dict can be obtained any time source files are modified by selecting data-compile dictionary from the menu.
We wrote a new class for generating C files with the phoneme table. We added a new function, Translator::GenerateDict(), in which the current dictionary list is saved in a header file (m3_dict.h). Now we have a dictionary. For the quick search of words in the dictionary, a 10-bit hash table is generated.
On the microcontroller, all the functions regarding translation are written in mcu_translate.c. Due to the limited memory, the MCUTranslate() function takes data directly from the text source so the string passed to it must not suffer any changes until the current translation is finished. Each word in the string is looked up in the dictionary by calling MCULookup-DictList(). If the word is found, its phonetic translation is moved into the phoneme list at the input of the synthesizer.
The simple playback of a phoneme couldn’t be called speech because each phoneme changes its length, pitch, and amplitude as a function of the adjacent phoneme. So, after translation, pitch and length are adjusted by two simplified functions: MCUCalcPitches() and MCUCalcLengths(). It’s hard to call this prosody or intonation, but words can be heard clearly on a speaker.
|