Recognition module

The recognition module converts a spoken command (i.e., an utterance) to text. It collects the audio sample, passes it to the vendor's text-to-speech service for processing, and converts the vendor-specific result data to the format required by ASR. The recognition module returns a dictation result, which is strictly a speech-to-text conversion. The result is returned in the asr_result type).

The io-asr service passes the dictation result to the Natural Language Adaptation Layer (NLAL), which extracts intent information (asr_intent). This intent data is added to the original result structure. The NLAL uses BNF grammars provided with the conversation modules to analyze the dictation result to extract its meaning and produce intent results.

In some cases, the vendor's text-to-speech service can extract salient information from the utterance to construct an intent result. However, the dictation result must always be included so that ASR can perform its own natural language processing on the utterance.

For example, the utterance "search media for Hero" could result in the following dictation result from the recognizer:

The NLAL would then analyze this dictation result to create the following intent result:

The final recognition result (with intent information) is passed to the conversation modules to be interpreted and acted upon. The intent fields are vital for the conversation modules—the conversation can't take place if the system can't understand the meaning of an utterance.

The recognition module is tightly integrated with the third-party ASR vendor. If you require changes or want to use a different vendor, see the AT&T Watson reference application for guidance or contact QNX technical support.