February
1998, Issue 91
Low-Cost
Voice Recognition
INPUT
ROUTINE
When
a Recognize or Train event occurs, the input routine
is invoked (see Figure 4). A timer is set up and polled
until 110 ms has elapsed.
|

(Click
here to enlarge)
|
Figure
3—The main routine performs the event handler. Events
are generated by an interrupt caused by pressing
a push button or by system reset. The events dispatched
are Select, Train, Untrain, and Recognize. |
An
interrupt routine could have been used to time the samples
every 110 ms, but I was concerned that the overhead
to service the interrupt might make it difficult to
complete all the paths in the input routine within 110
ms.
Once
the time elapses, the input square wave is sampled.
If the sign changes from the previous measurement, one
of the two frequency bytes is updated.
The
threshold limit is set to six. In other words, if the
pulse (positive or negative) is greater than six samples
(roughly corresponding to 1.5 kHz), the "high"
frequency byte is incremented by one. If it’s less than
six, the "low" frequency byte is incremented.
The
rest of the routine is basically a state machine that
uses speech activity as an input to determine a utterance
bounded by silence. At each rising or falling edge,
another byte counts the zero crossings.
After
256 samples, a frame counter advances and several tests
are made. If the frame counter is greater than 64, the
input buffer is filled (i.e., you spoke too long) or
there is too much background noise, and an error is
generated.
Otherwise,
a timeout value is decremented and tested. This setup
enables the routine to exit if too much time elapses
before any sound is input.
If
the buffer isn’t full or a timeout has not occurred,
then it tests the zero-crossing counter. Too low a value
signifies silence, and a silence counter is incremented.
Otherwise,
a sound-activity counter is incremented. If the sound-activity
value is above a certain threshold and the silence value
is high enough, the routine exits with a valid data
sample.