System-wide audio management

Updated: April 19, 2023

The audio framework uses the System-Wide Audio Management (SWAM) layer to share audio-management information between the hypervisor host and guests. This information gives applications awareness of audio conditions external to their OS so they can perform audio ducking, suspending, or pausing based on the host's and the guest-local audio policies.

These actions provide system-wide coordination of audio management between different OSs. This ensures that end users such as vehicle drivers and/or occupants don't experience interfering music playback and can hear important sounds such as warning chimes when necessary.

Note: This software layer is considered mature because our specification that defines its interface is not changing.

Key capabilities

SWAM provides these key capabilities:
  • Pausing audio in a guest or the host when another media player begins playing, whether the new player is in the same or a different OS.
  • Suspending and resuming audio when transient audio is played in the same or a different OS.
  • Making an OS aware of ducking being applied by the host's audio management policy. This allows OS-local audio management policies or applications to decide what to do while their audio is being ducked.

Understanding the SWAM capabilities and design requires an explanation of several key terms; this is given in Audio terminology.

Architecture

The SWAM architecture is based on SWAM processes that run in the host and guests and that interact with the native (OS-local) audio services. There are two kinds of SWAM processes: the host audio management agent (HAMA) and the guest audio management agent (GAMA). The HAMA learns about audio events from the host's audio service, manipulates the host's audio environment, and communicates audio information to guests. The GAMA or an equivalent audio-management service communicates the guest's audio information to the HAMA and manipulates the guest's audio environment in response to the host-reported information.

The diagram below shows the interaction between the SWAM processes, guest applications, sound drivers used by the guest, host audio service (io-audio), and audio hardware.


Figure 1. SWAM architecture
Note: The SWAM layer requires additional services for its operation. Currently, io-audio with audio management enabled is needed for backend support in the host. To enable communication with the Android HAL for the HAMA, a virtio-net or virtio-vsock vdev is needed. These vdevs provide the host-guest communications (sockets or vsockets) that the HAMA uses to communicate with Android guest audio-management services, through gRPC and other mechanisms, and must be configured within the guest's VM and the hypervisor host. See the Virtual Socket chapter for details.

In an Android guest, the Android Audio HAL offers remote audio-management capabilities. The HAMA agent can send audio event information to the HAL (through a GAMA module it loads) and thus, no GAMA agent is required in the guest. Based on the information it receives, the HAL notifies the guest applications about audio events so they can duck, suspend, or pause their audio channels as needed. This activity affects the discrete audio streams that these applications send through the VirtIO sound driver to the virtio-snd vdev. There can be one or many vdev instances, depending on the audio types configured for the guest; for details, see HAMA configuration. The vdev instances then send the audio data (e.g., navigation prompts, music) to io-audio.

Conversely, in a QNX Neutrino or Linux guest, a GAMA agent is required to exchange audio-management information with the host and update the audio environment on the guest side.

Note:

To use shared audio and perform audio management in QNX Neutrino or Linux guests, you must work with QNX Engineering Services to implement the front ends for audio sharing and SWAM.

These types of guests would use some kind of VirtIO audio device (which is not shown in the diagram for simplicity) to send audio data to io-audio in the host. The guest applications can use audio channels of various types provided they are enabled for the VirtIO audio device. If the native audio service doesn't support routing audio based on type, the audio channels used must be of the default type.

The audio management master for SWAM is io-audio on the host. The io-audio service reads the audio policy configuration file (e.g., audio-policy.conf) to learn about the various audio types defined for the system and their attributes such as priority and preemtability. In a virtualization frameworks system, this file must define all audio types needed by applications in the host or in any guest that will be run. For an explanation of the contents of this file, see Syntax of the audio policy configuration file in the Audio Developer's Guide.

The HAMA agent monitors the audio zones (i.e., playback regions) containing the devices used by the host and guests, and receives information about audio events affecting these zones from io-audio. Based on this information and on its configuration, the HAMA agent sends along any event information that affects guest audio channels to the corresponding guests. If the events affect audio channels used by host applications (which aren't shown in the diagram for simplicity), the agent updates the host's audio environment by ducking, suspending, or pausing channels.