Controlling voice conversion

Updated: April 19, 2023

Configuration of the libasound library is based on the maximum number of voices supported in hardware.

The snd_pcm_plugin_params() function instantiates a voice converter in the following scenarios:
  • The source and destination voices do not use the same channel mappings.
  • The number of source and destination voices is different.

If the number of source voices and their mapping matches the destination, the converter isn't invoked.

Default channel mappings

The following channel mappings are used for application playback streams. If the destination uses different mappings, the voice converter is invoked to map the source channels to appropriate destination channels.

Number of channels Default mapping
1 Front left OR mono
2
  • 1 — Front left
  • 2 — Front right
4
  • 1 — Front left
  • 2 — Front right
  • 3 — Rear left
  • 4 — Rear right
6
  • 1 — Front left
  • 2 — Front right
  • 3 — Front center
  • 4 — Low-frequency effects
  • 5 — Rear left
  • 6 — Rear right
8
  • 1 — Front left
  • 2 — Front right
  • 3 — Front center
  • 4 — Low-frequency effects
  • 5 — Rear left
  • 6 — Rear right
  • 7 — Surround left
  • 8 — Surround right

If the stream has a number of voices other than the ones above, the voice converter does not perform any remapping by default. The application must query the driver channel map (chmap) and configure the voice matrix to do any required mapping of the channels. See Overriding the default voice conversion.

Destination has more voices (upmixing)

When there are a larger number of voices on the destination, the default voice conversion behavior is as follows:

From To Conversion
Mono Stereo Replicate channel 1 (left) to channel 2 (right).
Mono 4-channel

Replicate channel 1 to the front left, front right, rear left, and rear right.

If the hardware's channel map has only some of these mappings, channel 1 is replicated on those channels and the remaining ones are silent.

If the hardware's channel map has none of these mappings, map channel 1 on the first available channel and the remaining channels are silent.

Stereo 4-channel Replicate channel 1 to front left and rear left and channel 2 to front right and rear right.

If the hardware's channel map has only some of these mappings, replication happens for the available mappings only. Any remaining channels are silent.

If the hardware's channel map has none of these mappings, map channel 1 to the first available channel and channel 2 to the next available one. The remaining channels are silent.

Mono More than 4 channels

Replicate channel 1 to the front left, front right, rear left, and rear right. All remaining channels are silent.

If the hardware's channel map has only some of these mappings, channel 1 is replicated on those channels and the remaining ones are silent.

If the hardware's channel map has none of these mappings, map channel 1 on the first available channel. The remaining channels are silent.

Stereo More than 4 channels

Replicate channel 1 to front left and rear left and channel 2 to front right and rear right. All remaining channels are silent.

If the hardware's channel map has only some of these mappings, replication happens for the available mappings only. Any remaining channels are silent.

If the hardware's channel map has none of these mappings, map channel 1 to the first available channel and channel 2 to the next available one. All remaining channels are silent.

Destination has fewer voices (downmixing)

When there are fewer voices on the destination, the default voice conversion behavior is as follows:

From To Conversion
Stereo Mono Average channels 1 and 2.
4-channel Mono Average front-left, front-right, rear-left, and rear-right channels (or whichever of these are present). Any other mappings are ignored.
4-channel Stereo

Average front left and rear left to front left, and front right and rear right to front right. Any other mappings are ignored.

If no mappings are available, the first two channels of the source are mapped to the first two channels of the destination. The remaining two source channels are ignored.

More than 4 channels Mono

Average front-left, front-right, rear-left, and rear-right channels (or whichever of these are present). All other mappings are ignored.

If no mappings are available, the first channel of the source is mapped to the destination channel and all other source channels are ignored.

More than 4 channels Stereo

Average front left and rear left to front left, and front right and rear right to front right. All other mappings are ignored.

If no mappings are available, first two channels of the source are mapped to the two destination channels and the additional source channels are ignored.

Any number greater than the destination Other than stereo or mono (e.g., 5.1 surround sound (6 channels)) The first channel of the source is mapped to the first destination channel, the second source channel to the second destination channel, and so on, until all the destination channels are mapped to. The remaining source channels are ignored.
Note: In QNX Neutrino 6.5.0, libasound converted stereo to mono by simply dropping the right channel. In QNX Neutrino 6.6.0 or later, libasound averages the left and right channels to generate the mono stream.

Overriding the default voice conversion

You can use the voice conversion API to override the default conversion behavior and place any source channel in any destination channel slot:

snd_pcm_plugin_get_voice_conversion()
Get the current voice conversion structure for a channel
snd_pcm_plugin_set_voice_conversion()
Set the current voice conversion structure for a channel

The actual conversion is controlled by the snd_pcm_voice_conversion_t structure, which is defined as follows:

typedef struct snd_pcm_voice_conversion
{       
   uint32_t     app_voices;
   uint32_t     hw_voices;
   uint32_t     matrix[32];
} snd_pcm_voice_conversion_t

The matrix member forms a 32-by-32-bit array that specifies how to convert the voices. The array is ranked with rows representing application voices, voice 0 first; the columns represent hardware voices, with the low voice being LSB-aligned and increasing right to left.

For example, consider a mono application stream directed to a 4-voice hardware device. A bit array of:

matrix[0] = 0x1;  //  00000001

causes the sound to be output on only the first hardware channel. A bit array of:

matrix[0] = 0x9;   // 00001001

causes the sound to appear on the first and last hardware channel.

Another example would be a stereo application stream to a 6-channel (5.1) output device. A bit array of:

matrix[0] = 0x1;  //  00000001
matrix[1] = 0x2;  //  00000010

causes the sound to appear on only the front two channels, while:

matrix[0] = 0x5;  //  00000101
matrix[1] = 0x2;  //  00000010

causes the stream signal to appear on the first four channels—likely the front and rear pairs, but not on the center or low-frequency effects (LFE) channels. The bitmap used to describe the hardware (i.e., the columns) depends on the hardware, and you need to be mindful of the actual hardware you'll be running on to properly map the channels. For example:

If you call snd_pcm_plugin_get_voice_conversion() before the voice conversion plugin has been instantiated, the function fails and returns -ENOENT. In QNX Neutrino 6.6 or later, snd_pcm_plugin_set_voice_conversion() instantiates the plugin if it doesn't already exist.