circuitcellar.com
Magazine Support   Digital Library   Products & Services   Suppliers Directory 
 
 





 

September 1999, Issue 110

Taking Orders:
A Speech Recognition Module


by John Iovine

RECOGNITION ERRORS

There are two common errors associated with speech recognition—rejection (failure to recognize a target word) and substitution (recognition of a nontarget word or confusion between two target words).

When the module detects errors, it pulses the error pin high for 1 s. The LED connected to pin 10 on JP3 signals this condition. Errors also initiate a verbal response like "Spoke too soon," "Please talk louder," "Please talk softer," and so on as the ERROR line is pulse high for 1 s.

The message "Word not recognized" is not handled as an error. If the module isn’t trained on the word that initiated this message, it’s not really an error.

The module has the ability to increase its selectivity. Figure 2 is configured for Relaxed Training and Relaxed Recognition. On powerup (or reset) the Train and Recognize pins control the selectivity.

If a 100-kW resistor is bridged across the Train switch, which essentially pulls the Train pin to ground with a 100-kW resistor, the module enters Strict Training mode. In this mode, the module rejects more similar-sounding words, resulting in better recognition of the words accepted.

Pulling the Recognize pin to ground with a 100-kW resistor places the module in Strict Recognition mode. The module recognizes fewer words and may reject trained words (fewer substitutions).

In the schematic, both the Train and Recognize pins are left floating (open circuit), which places the module in the Relaxed mode.

IMPROVING RECOGNITION

There are a number of ways to optimize recognition. Word selection is one primary technique—avoid homonyms such as red, bed, said and so on. In most cases, a synonym or approximate synonym can be used. For example, use "crimson" or "scarlet" in place of "red."

Another way to improve recognition is to match the equipment to the environment. The type of microphone you use to train should be the same type used for recognition. The distance from the microphone to the speaker’s mouth should be approximately the same for training and recognition.

Keep in mind that your voice changes under stress or excitement. Imagine you’re creating a voice-controlled joystick to fly your favorite military flight simulator. Your voice will sound quite different when you’re sitting at your desk calmly programming your voice into the chip versus when you’re engaged in a dogfight yelling, "Fire! Fire! Bank left!" You have to emulate the stress and excitement you feel while playing the game when you’re programming the commands.