ASR operates in a cyclical fashion, performing the same sequence of operations as often as
required to complete the user's request. Each cycle of these operations is referred to as
a recognition turn and is illustrated in the following diagram:
ASR's control flow works as follows:
- ASR is triggered by the prompt module, which monitors the system for events
(a UI button press or a PPS update, for example) and then starts a recognition turn
by prompting the user for a command.
- After the prompt is rendered, ASR passes control to the audio capture module
to capture the user's spoken command. On a successful capture, control
passes to the recognition module. If the command capture
isn't successful, control passes back to the prompt module to retry.
- The recognition module converts the audio command to a text string and
assigns the result a level of confidence to indicate how well the command
was "understood" by the recognizer. Depending on the configuration, if the
confidence level isn't high enough, ASR will prompt the user again.
- When a successful result is available, ASR passes control to the
conversation modules. The conversation modules must first determine the
context of the command (e.g., search, multimedia, or phone). The context
determines which conversation module takes over to complete the command.
When a context is determined, the associated conversation module is
"exclusive." That is, it's the only conversation module that will handle results until
this command is fulfilled. At this point, the exclusive module either
completes the action or triggers another recognition turn to have the user
prompted again for more information. This process continues until the
action is completed. The conversation module then removes its exclusive
status so that a fresh recognition turn can proceed.