circuitcellar.com
Magazine Support   Digital Library   Products & Services   Suppliers Directory 
 
 





 

Issue 133 August 2001
Listening Chips


by Tom Cantrell

Start In The Realm Of The SensoryLip Reader Walk The TalkSoft Sounds Yak AttackHearing AidSources & PDF

YAK ATTACK

Alright already, enough of this book learning. Now, it’s time to give Voice Extreme a workout.

I had little trouble installing and firing up the software. Sensory thoughtfully supplies a number of short demo programs to both serve as coding examples and assist hardware checkout and evaluation.

There were a couple of problems I filed under beta-site gotchas. When I tried to experiment with one of the demo projects, the compiler would hang up partway through the build process. The release document did allude to getting an error message during builds with older versions of Windows and offered a fix (increase the number of environment variables in SYSTEM.INI). Even though I wasn’t getting the error message and my PC is relatively new, I made the fix and the build proceeded smoothly.

Next, I tried to download a demo, but the software insisted it couldn’t find the VE board. Yes, the serial cable to the PC was plugged in. After some head scratching, I discovered the solution in the documentation, although it mistakenly referred to operation with the older version of the kit. To get the board to download a new flash image simply requires holding down one of the buttons (aptly named VELOAD) during reset.

Check out the speaker-independent demo I played with in Listing 1. The program prompts you to say one of six words (call, erase, modify, play, record, or skip) and attempts to recognize it. If successful, the program announces, "You said (appropriate word)," otherwise it responds with "What did you say?"

Listing 1— This code demonstrates the VE C in action, running a demonstration of speaker-independent recogni tion. The process boils down to pattern generation (PatGenW) and then recognition (Recog) with a level of confidence (GetRecogLevel1).


Although it’s a toy program, it clearly demonstrates the power of VE when it comes to writing speech recognition applications. I’m certainly not familiar with any other technology that could come close to what VE C does in a mere page and a half of code (including comments).

It’s easy to write the software, but how well does it work? There are times when having kids is actually useful, I reminded myself as I conscripted the litter and marched them into the office. I figured that between the younger kids’ chirps, the 14-year old’s adolescent cracks, and my rumbling growls, we’d put the speaker-independent claims to the test.

Seeing everyone huddled around shouting at a circuit board must have been a sight, but by golly if it didn’t work like a champ. In fact, the relatively few errors that did occur seemed more like a matter of gain issues (i.e., not speaking directly at or close enough to the microphone) than weakness in the recognition itself. Furthermore, this explains why the documentation includes a fair amount of discussion related to proper microphone placement, mounting, housing materials, and so forth.

It was possible to probe the error zone by intentional mispronunciation, speaking unnaturally fast or slow, or, on occasion, imitating Homer Simpson. But in reality, the recognition was uncannily accurate under nominal conditions of natural speaking, proper microphone placement, and ambient background noise.

To be fair, like the demo program, you need to stack the deck when defining your vocabulary. Notice how the words chosen for the demo don’t sound alike. I didn’t have time to try it, but I’m sure the ’364 would have problems dealing with a vocabulary of words that sound alike but have different meanings (i.e., one, won, win, wan, when, warn). Fortunately, just as the English language is goofy enough to have words that sound alike but have completely different meanings, you can usually come up with a different sounding word that gets the same meaning across.