Issue
133 August 2001
Listening Chips
Start
In The Realm Of The Sensory
Lip Reader
Walk The Talk Soft
Sounds Yak Attack
Hearing Aid
Sources & PDF
YAK
ATTACK
Alright already,
enough of this book learning. Now, its time to give
Voice Extreme a workout.
I had little
trouble installing and firing up the software. Sensory
thoughtfully supplies a number of short demo programs
to both serve as coding examples and assist hardware checkout
and evaluation.
There were a
couple of problems I filed under beta-site gotchas. When
I tried to experiment with one of the demo projects, the
compiler would hang up partway through the build process.
The release document did allude to getting an error message
during builds with older versions of Windows and offered
a fix (increase the number of environment variables in
SYSTEM.INI). Even though I wasnt getting the error
message and my PC is relatively new, I made the fix and
the build proceeded smoothly.
Next, I tried
to download a demo, but the software insisted it couldnt
find the VE board. Yes, the serial cable to the PC was
plugged in. After some head scratching, I discovered the
solution in the documentation, although it mistakenly
referred to operation with the older version of the kit.
To get the board to download a new flash image simply
requires holding down one of the buttons (aptly named
VELOAD) during reset.
Check out the
speaker-independent demo I played with in Listing 1. The
program prompts you to say one of six words (call, erase,
modify, play, record, or skip) and attempts to recognize
it. If successful, the program announces, "You said
(appropriate word)," otherwise it responds with "What
did you say?"
Listing
1
This code demonstrates the VE C in action, running
a demonstration of speaker-independent recogni
tion. The process boils down to pattern generation
(PatGenW) and then recognition (Recog) with a
level of confidence (GetRecogLevel1).
|
Although its a toy program, it clearly demonstrates
the power of VE when it comes to writing speech recognition
applications. Im certainly not familiar with any
other technology that could come close to what VE C does
in a mere page and a half of code (including comments).
Its easy
to write the software, but how well does it work? There
are times when having kids is actually useful, I reminded
myself as I conscripted the litter and marched them into
the office. I figured that between the younger kids
chirps, the 14-year olds adolescent cracks, and
my rumbling growls, wed put the speaker-independent
claims to the test.
Seeing everyone
huddled around shouting at a circuit board must have been
a sight, but by golly if it didnt work like a champ.
In fact, the relatively few errors that did occur seemed
more like a matter of gain issues (i.e., not speaking
directly at or close enough to the microphone) than weakness
in the recognition itself. Furthermore, this explains
why the documentation includes a fair amount of discussion
related to proper microphone placement, mounting, housing
materials, and so forth.
It was possible
to probe the error zone by intentional mispronunciation,
speaking unnaturally fast or slow, or, on occasion, imitating
Homer Simpson. But in reality, the recognition was uncannily
accurate under nominal conditions of natural speaking,
proper microphone placement, and ambient background noise.
To be fair,
like the demo program, you need to stack the deck when
defining your vocabulary. Notice how the words chosen
for the demo dont sound alike. I didnt have
time to try it, but Im sure the 364 would
have problems dealing with a vocabulary of words that
sound alike but have different meanings (i.e., one, won,
win, wan, when, warn). Fortunately, just as the English
language is goofy enough to have words that sound alike
but have completely different meanings, you can usually
come up with a different sounding word that gets the same
meaning across.