Automatic Speech Recognition (ASR)

The platform includes an ASR subsystem that provides speech-recognition and text-to-speech services to other system components and third-party applications.

To start an ASR session, tap the Push-to-talk tab on the taskbar, then wait for the audible cue before you say a command. Shorter commands have lower success rates.

For a listing of commands you can use, see "Supported voice commands".

For more information about using ASR for different tasks, see the task-specific pages (e.g., "Media Player"). For more information about the ASR modules, see the following:

"Automatic speech recognition" in the Architecture Guide
Automatic Speech Recognition Developer's Guide

ASR grammars

You can modify the grammars specified in the /etc/asr-car.cfg file to define keys (synonyms) for the supported speech commands. The grammars reside in the car-control.cfg files that are listed in the localized-assets section of asr-car.cfg for each module. For example, the grammar for the car-media module is located at $(locale-dir)/car-media/car-control.cfg.

Recognition latency

Several factors affect the latencies of voice-command recognition:

End of Speech (EOS) detection—Too much ambient noise might prevent the ASR service from detecting EOS. In this case, the service uses the max_utterance_seconds setting to limit the audio capture. You can change this setting in the /etc/asr-car.cfg file.

To be certain that this is an issue, connect to the target and run sloginfo -w to determine when the audio capture completes. The time from completion of audio capture to response is the offboard recognition latency.

You can change EOS detection parameters in the asr-car.cfg file. To find the relevant settings, search for eos. Try to tune the EOS detector to perform better for your environment. You can adjust the eos-* values in steps of 50. Slay and restart io-asr-generic after each change, and then test how quickly the ASR service detects EOS. Note also that EOS detection is poor if the signal is not very dynamic. Before you start driving, make sure that the microphone has a fresh battery and that it's pointed in the right direction.

ASR server congestion: Server usage might be higher than usual. Run sloginfo -w to determine if the latency is on the recognition server.
Text-to-speech (TTS) latency: A slow TTS response can affect the perceived responsiveness of the system. The latency of the message that announces what is being done, or an unrecognized command, might appear to a user to be associated with a voice-recognition issue. In this case, the output of sloginfo -w should give you a good sense of the TTS latency as well. The service will log the message to be spoken before sending the request to the ASR service.

Determining unrecognized commands

If you say a command that the system doesn't recognize, there are a number of ways you can get more information about how the command was interpreted. You can:

see the interpreted command on the screen
examine the system log. To examine the system log and search for a particular command, run the following:
sloginfo -w | grep utterance

examine the /pps/services/asr/control object to see what ASR understood and what intents it extracted from the command. For example, the command "Switch to media player" results in the following update to the PPS object:

@control
result:json:{"confidence":925,
             "recognizer":"io-asr-nlal",
             "status":"result_ok",
             "type":"intent",
             "action":"launch",
             "utterance":"Switch to media player",
             "intents":[{"field":"application","value":"media player"}]}
speech::handled
state::idle
strobe::on