Automatic Speech Recognition

The ASR subsystem (io-asr) offers complete speech-recognition services and supports Nuance VoCon Hybrid 4.4 for speech-to-text (STT) and text-to-speech (TTS) conversion. In addition, a reference implementation using AT&T Watson can help guide you in integrating the ASR subsystem with other recognition providers.

Note: To use VoCon, you must have a separate licensing agreement and NDA with Nuance.

ASR provides speech-recognition services in the following areas:

search—launch an application, get weather information, get directions or search for points of interest
multimedia—play tracks by artist, album, genre, or song title, and control playback (pause, previous track, next track, and so on).
voice dialing—by contact name or by number (requires a connected Bluetooth device)

The `io-asr` service

The ASR service is referred to as io-asr throughout this guide. However, io-asr is a shorthand term for the actual service that runs. The name of the service depends on which recognizer is being used. The actual service is io-asr-recognizer_name, for example, io-asr-vocon.

Modules

ASR uses modules to perform the various functions that provide end-to-end handling of spoken commands. It doesn't handle speech by itself; it passes information between modules so that they can perform the various stages of recognizing and taking action on spoken commands. The modules interact with io-asr, not directly with each other.

The ASR subsystem has four types of modules:

prompt—requests information from the user. Prompts can be audio (spoken directions or another sound such as a beep or a bell) or visual (for example, text written to a portion of the display, a prompt screen with various options, or some other visual cue such as a change of icon or color).
audio—listens for commands and captures audio from the microphone (capture module), or plays audio back to the user (file module).
recognition—converts speech to text. After the audio module has captured a command (called an "utterance"), ASR instructs the recognition module to convert the captured audio command to a text string that can be passed to a converstaion module.
conversation—interprets the text command and takes the appropriate action. There are conversation modules for three types of commands: search, multimedia, voice dialing. Each of these conversation modules has its own configuration that determines the grammar used to interpret the command. Once a command is successfully interpreted, the conversation module invokes the subsystem required and passes it the information it needs to complete the request. Often the required action is invoking another ASR pass to get more information from the user.

ASR startup

When the target system boots, System Launch and Monitor (SLM) starts ASR and passes it the path to the configuration file (${QNX_TARGET}/etc/asr-car.cfg). ASR begins to run as a daemon, reads the configuration file, and loads it into memory. It then loads the modules as specified by the configuration file by calling dlopen() for the associated DLLs. Each module has a constructor function that registers it with ASR by calling the module's connect function. The module's constructor function calls one of the following, depending on what type of module it is:

asra_connect() for an audio module
asrm_connect() for a conversation module
asr_connect() for a recognition module
asrp_connect() for a prompt module

ASR then invokes each registered module's initialize callback function, which initializes any private or module-specific data. At this point, ASR is ready to handle voice commands.

PPS integration

ASR uses PPS to communicate with the HMI. See /pps/services/asr/control in the PPS Objects Reference for more information.

Automatic Speech Recognition

The io-asr service

Modules

ASR startup

PPS integration

The `io-asr` service