sgsguru: (Default)
sgsguru ([personal profile] sgsguru) wrote2012-02-25 02:31 pm
Entry tags:

Speech Recognition

This is a topic that I don't really know all that much about.  Corrections welcome.  This is very much a work in progress.

Speech recognition sucks.  There's been a lot of research; why isn't it any better?

Speech consists of an assortment of hisses and buzzes that are interpreted by the brain.  The current software approach is to try to go directly from the basic noises to words, using software brute force:
sound → words
Seems to me, if we break this process down, we can get a lot more accuracy.  The tool that linguists use is the "international phonetic alphabet" (IPA), which expresses individual phonemes:
sound → IPA → words

This gives us, right at the start, speaker independence.

For that first step, we can use something I call "predictive filtering" (I'm sure there's a "real" name for it).  We classify the sounds coming in to one of a finite set of "base sounds"; some small combination of buzzes and hisses that make up a specific sound.  To combine these base sounds into a phoneme, we look at the set of all possible sequences of base sounds that start with that base.
  1. Get initial base sound
  2. Generate set of all possible sequences starting with that base
  3. Cache the "most probable" sequence(s)
  4. Get the next base sound
  5. Prune off the sequences that don't have that sound as their second element
  6. Cache "most probable"
  7. Continue until we have a single sequence matching a single phoneme.
  8. Output phoneme
  9. Go to step 1
Generating the set of sequences and the pruning process will parallelize like crazy.  It's a classic map-reduce function.  We're doing our pattern matching one small step at a time.  Also, I expect the set of sequences to converge very rapidly; most steps will simply verify the input against the "expected value" in the cache.

This filtering technique can also be used for going from IPA to words.



Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org