Issue
133 August 2001
Listening Chips
Start
In The Realm Of The Sensory
Lip Reader
Walk The Talk Soft
Sounds Yak Attack
Hearing Aid
Sources & PDF
IN THE
REALM OF THE SENSORY
Although it
hasnt reached household name status, in the relatively
new field of voice recognition, Sensory can be considered
one of the pioneers. Theyve been around for years,
slowly but surely percolating their technology into emerging
applications one by one.
Ive kept
in touch with Sensory and monitored their progress, but
held back on writing an article. The fact is, with ASIC-
and ROM-based custom silicon underpinning a focus-accounts
marketing strategy, what they had to offer was only suitable
for a few big outfits like Sony, VTech, and Uniden. But
now, after successfully establishing their place, Sensory
is moving to expand the market with low-cost standard
chips suitable for a broad range of applications from
customers big and small.
Enter the Voice
Extreme Toolkit (see Photo 1) which, at only $129, is
not only ideal for prototyping and demos, but is also
suitable for moderate volume applications.
 |
| Photo
1When
it comes to voice recognition, the Voice Extreme
Toolkit represents a new high in ease of use and,
at only $129, a new low in price. |
The kit is wrapped
around a special version of Sensorys RSC-364 speech-recognition
chip. The ROM on the chip is factory-programmed with a
C-like language interpreter and memory manager designed
to work with a commodity external flash memory chip. Note
that a ROM-less version, RSC-360, is available (see Table
1).
| Table
1The
RSC-364, with 64-KB on-chip ROM, is a single-chip
voice recognition solution. Taking advantage of
the features that require lots of storage, such
as voice recording, requires adding external memory.
Accuracy above 95% must be maintained. The RSC-364
assumes the use of on-chip ROM/RAM only and external
serial EEPROM memory. It depends on the choice of
musical instrument and requires external storage
for recordings. |
The external flash memory is used to store an applications
particular vocabulary, specifically the templates and
weights that lie at the heart of Sensorys recognition
technology. There are two sources for the vocabulary,
and the choice is determined by the specifics of the application.
For speaker-independent
applications, Sensory can draw from a library of common
words in the major languages or provide service to generate
a custom (i.e., atypical language) vocabulary. By contrast,
speaker-dependent apps rely on training (i.e., writing
flash memory) by the end user in the field.
An interesting
tweak of speaker-dependent recognition is known as speaker
verification. The latter is kind of the inverse of the
former. Instead of recognizing a word from a predefined
vocabulary spoken by a known person, verification recognizes
which speaker from a predefined group is saying a known
word.
A specific application
might use a combination of recognition modes. For instance,
a security system could recognize a particular users
voice (speaker verification) and then, knowing his identity,
determine his specific password (speaker-dependent) before
accepting generic commands (speaker-independent).
Other Sensory
variations on the recognition theme include word spotting
and continuous listening. Word spotting finds trigger
words in continuous speech, so "Please open the door"
could be recognized as "open door." To reduce
false triggering complications, use words with more syllables
or include more than one word, like a brief phrase.
Because there
is a slight delay between recognition of the first word
in a multi-word trigger and listening for the following
word, I recommend that you try establishing a scheme that
uses trigger words that are naturally separated by other
speech or otherwise wont easily run together. Note
that word spotting only works with speaker-dependent recognition.
Continuous listening
is similar, except that it waits for a specific isolated
phrase (i.e., only "open door" would be recognized),
with pauses delineating each word. Although not as powerful
as word spotting, continuous listening does have the advantage
of working with both speaker-dependent and independent
recognition modes.
© Circuit Cellar,
The Magazine for Computer Applications. Reprinted with
permission. For subscription information call (860) 875-2199,
email subscribe@circuitcellar.com or on our web site at
www.circuitcellar.com.