Playing and Capturing Audio Data

This chapter describes the major steps required to play back and capture (i.e., record) sound data.

It includes:

Handling PCM devices

The software processes for playing back and capturing audio data are similar. This section describes the common steps:

Opening your PCM device

The first thing you need to do in order to playback or capture sound is open a connection to a PCM playback or capture device. The API calls for opening a PCM device are:

snd_pcm_open_name()
Use this call when you want to open a specific hardware device, and you know its name.
snd_pcm_open()
Use this call when you want to open a specific hardware device, and you know its card and device number.
snd_pcm_open_preferred()
Use this call to open the user's preferred device.

Using this function makes your application more flexible, because you don't need to know the card and device numbers; the function can pass back to you the card and device that it opened.

These API calls set a PCM connection handle that you'll use as an argument to all other PCM API calls. This handle is very analogous to a file stream handle. It's a pointer to a snd_pcm_t structure, which is an opaque data type.

These functions, like others in the QSA API, work for both capture and playback channels. They take as an argument a channel direction, which is one of:

This code fragment from the wave.c example in the appendix uses these functions to open a playback device:

if (card == -1)
{
    if ((rtn = snd_pcm_open_preferred (&pcm_handle,
                  &card, &dev,
                  SND_PCM_OPEN_PLAYBACK)) < 0)
        return err ("device open");
}
else
{
    if ((rtn = snd_pcm_open (&pcm_handle, card, dev,
                  SND_PCM_OPEN_PLAYBACK)) < 0)
        return err ("device open");
}

If the user specifies a card and a device number on the command line, this code opens a connection to that specific PCM playback device. If the user doesn't specify a card, the code creates a connection to the preferred PCM playback device, and snd_pcm_open_preferred() stores the card and device numbers in the given variables.

Configuring the PCM device

The next step in playing back or capturing the sound stream is to inform the device of the format of the data that you're about to send it or want to receive from it. You can do this by filling in a snd_pcm_channel_params_t structure, and then calling snd_pcm_channel_params() or snd_pcm_plugin_params(). The difference between the functions is that the second one uses the plugin converters (see PCM plugin converters in the Audio Architecture chapter) if required.

If the device can't support the data parameters you're setting, or if all the subchannels of the device are currently in use, both of these functions fail.

The API calls for determining the current capabilities of a PCM device are:

snd_pcm_plugin_info()
Use the plugin converters. If the hardware has a free subchannel, the capabilities returned are extensive because the plugin converters make any necessary conversion.
snd_pcm_channel_info()
Access the hardware directly. This function returns only what the hardware capabilities are.

Note: Both of these functions take as an argument a pointer to a snd_pcm_channel_info_t structure. You must set the channel member of this structure to the desired direction (SND_PCM_CHANNEL_CAPTURE or SND_PCM_CHANNEL_PLAYBACK) before calling the functions. The functions fill in the other members of the structure.

It's the act of configuring the channel that allocates a subchannel to the client. Stated another way, hundreds of clients can open a handle to a PCM device with only one subchannel, but only one can configure it. After a client allocates a subchannel, it isn't returned to the free pool until the handle is closed. One result of this mechanism is that, from moment to moment, the capabilities of a PCM device change as other applications allocate and free subchannels. Additionally the act of configuring / allocating a subchannel changes its state from SND_PCM_STATUS_NOTREADY to SND_PCM_STATUS_READY.

If the API call succeeds, all parameters specified are accepted and are guaranteed to be in effect, except for the frag_size parameter, which is only a suggestion to the hardware. The hardware may adjust the fragment size, based on hardware requirements. For example, if the hardware can't deal with fragments crossing 64-kilobyte boundaries, and the suggested frag_size is 60 kilobytes, the driver will probably adjust it to 64 kilobytes.

Another aspect of configuration is determining how big to make the hardware buffer. This determines how much latency that the application has when sending data to the driver or reading data from it. The hardware buffer size is determined by multiplying the frag_size by the max_frags parameter, so for the application to know the buffer size, it must determine the actual frag_size that the driver is using.

You can do this by calling snd_pcm_channel_setup() or snd_pcm_plugin_setup(), depending on whether or not your application is using the plugin converters. Both of these functions take as an argument a pointer to a snd_pcm_channel_setup_t structure that they fill with information about how the channel is configured, including the true frag_size.

Controlling voice conversion

The libasound library supports devices with up to eight voices; configuration is based on the maximum number of voices supported in hardware. If the numbers of source and destination voices are different, then snd_pcm_plugin_params() instantiates a voice converter.

The default voice conversion behavior is as follows:

From To Conversion
Mono Stereo Replicate channel 1 (left) to channel 2 (right)
Stereo Mono Remove channel 2 (right)
Mono 4-channel Replicate channel 1 to all other channels
Stereo 4-channel Replicate channel 1 (front left) to channel 3 (rear left), and channel 2 (front right) to channel 4 (rear right)

Note: Previous versions of libasound converted stereo to mono by averaging the left and right channels to generate the mono stream. Now by default, the right channel is simply dropped.

You can use the voice conversion API to configure the conversion behavior and place any source channel in any destination channel slot:

snd_pcm_plugin_get_voice_conversion()
Get the current voice conversion structure for a channel
snd_pcm_plugin_set_voice_conversion()
Set the current voice conversion structure for a channel

The actual conversion is controlled by the snd_pcm_voice_conversion_t structure, which is defined as follows:

typedef struct snd_pcm_voice_conversion
{       
   uint32_t     app_voices;
   uint32_t     hw_voices;
   uint32_t     matrix[32];
} snd_pcm_voice_conversion_t

The matrix member forms a 32-by-32-bit array that specifies how to convert the voices. The array is ranked with rows representing application voices, voice 0 first; the columns represent hardware voices, with the low voice being LSB-aligned and increasing right to left.

For example, consider a mono application stream directed to a 4-voice hardware device. A bit array of:

matrix[0] = 0x1;  //  00000001

causes the sound to be output on only the first hardware channel. A bit array of:

matrix[0] = 0x9;   // 00001001

causes the sound to appear on the first and last hardware channel.

Another example would be a stereo application stream to a 6 channel (5.1) output device. A bit array of:

matrix[0] = 0x1;  //  00000001
matrix[1] = 0x2;  //  00000010

causes the sound to appear on only the front two channels, while:

matrix[0] = 0x5;  //  00000101
matrix[1] = 0x2;  //  00000010

causes the stream signal to appear on the first four channels (likely the front and rear pairs, but not on the center or LFE channels). The bitmap used to describe the hardware (i.e., the columns) depends on the hardware, and you need to be mindful of the actual hardware you'll be running on to properly map the channels. For example:


Note: If the number of source voices matches the number of destination voices, the converter isn't invoked, so you won't be able to reroute the channels. If you're playing a stereo file on stereo hardware, you can't use the voice matrix to swap the channels because the voice converter isn't used in this case.

If you call snd_pcm_plugin_get_voice_conversion() or snd_pcm_plugin_set_voice_conversion() before the voice conversion plugin has been instantiated, the functions fail and return -ENOENT.

Preparing the PCM subchannel

The next step in playing back or capturing the sound stream is to prepare the allocated subchannel to run. Do this by calling one of:

This step and the SND_PCM_STATUS_PREPARED state may seem unnecessary, but they're required to correctly handle underrun conditions when playing back, and overrun conditions when capturing. For more information, see If the PCM subchannel stops during playback and If the PCM subchannel stops during capture,” later in this chapter.

Closing the PCM subchannel

When you've finished playing back or capturing audio data, you can close the subchannel by calling snd_pcm_close(). This call releases the subchannel and closes the handle.

Playing audio data

Once you've opened and configured a PCM playback device and prepared the PCM subchannel (see Handling PCM devices,” above), you're ready to play back sound data.

There's a complete example of playback in the wave.c example in the appendix. You may wish to compile and run the application now, and refer to the running code as you progress through this section.


Note: If your application has the option to produce playback data in multiple formats, choosing a format that the hardware supports directly will reduce the CPU requirements.

This section includes:

Playback states

The state diagram for a PCM device during playback is shown below.


PCM states


State diagram for PCM devices during playback.

The transition between SND_PCM_STATUS_* states is the result of executing an API call, or the result of conditions that occur in the hardware:

From To Cause
NOTREADY READY Calling snd_pcm_channel_params() or snd_pcm_plugin_params()
READY PREPARED Calling snd_pcm_channel_prepare(), snd_pcm_playback_prepare(), or snd_pcm_plugin_prepare()
PREPARED RUNNING Calling snd_pcm_write() or snd_pcm_plugin_write()
RUNNING PAUSED Calling snd_pcm_channel_pause() or snd_pcm_playback_pause()
PAUSED RUNNING Calling snd_pcm_channel_resume() or snd_pcm_playback_resume()
RUNNING UNDERRUN The hardware buffer became empty during playback
RUNNING CHANGE The stream changed
RUNNING ERROR A hardware error occurred
UNDERRUN, CHANGE, or ERROR PREPARED Calling snd_pcm_channel_prepare(), snd_pcm_playback_prepare(), or snd_pcm_plugin_prepare()

For more details on these transitions, see the description of each function in the Audio Library chapter.

Sending data to the PCM subchannel

You can send data to the subchannel by calling either one of the following, depending on whether or not you're using plugin converters:

snd_pcm_write()
The number of bytes written must be a multiple of the fragment size, or the write will fail.
snd_pcm_plugin_write()
The plugin accumulates partial writes until a complete fragment can be sent to the driver.

A full nonblocking write mode is supported if the application can't afford to be blocked on the PCM subchannel. You can enable nonblocking mode when you open the handle or by calling snd_pcm_nonblock_mode().


Note: This approach results in a polled operation mode that isn't recommended.

Another method that your application can use to avoid blocking on the write is to call select() (see the QNX Neutrino Library Reference) to wait until the PCM subchannel can accept more data. This is the technique that the wave.c example uses. It allows the program to wait on user input while at the same time sending the playback data to the PCM subchannel.

To get the file descriptor to pass to select(), call snd_pcm_file_descriptor().


Note: With this technique, select() returns when there's space for frag_size bytes in the subchannel. If your application tries to write more data than this, it may block on the call.

If the PCM subchannel stops during playback

When playing back, the PCM subchannel stops if the hardware consumes all the data in its buffer. This can happen if the application can't produce data at the rate that the hardware is consuming data. A real-world example of this is when the application is preempted for a period of time by a higher-priority process. If this preemption continues long enough, all data in the buffer may be played before the application can add any more.

When this happens, the subchannel changes state to SND_PCM_STATUS_UNDERRUN. In this state, it doesn't accept any more data (i.e., snd_pcm_write() and snd_pcm_plugin_write() fail) and the subchannel doesn't restart playing.

The only ways to move out of this state are to close the subchannel or to reprepare the channel as you did before (see Preparing the PCM subchannel,” earlier in this chapter). This forces the application to recognize and take action to get out of the underrun state; this is primarily for applications that want to synchronize audio with something else. Consider the difficulties involved with synchronization if the subchannel simply were to move back to the SND_PCM_STATUS_RUNNING state from underrun when more data became available.

Stopping the playback

If the application wishes to stop playback, it can simply stop sending data and let the subchannel underrun as described above, but there are better ways.

If you want your application to stop as soon as possible, call one of the drain functions to remove any unplayed data from the hardware buffer:

If you want to play out all data in the buffers before stopping, call one of:

Synchronizing with the PCM subchannel

QSA provides some basic synchronization functionality: your application can find out where in the stream the hardware play position is. The resolution of this position is entirely a function of the hardware driver; consult the specific device driver documentation for details if this is important to your application.

The API calls to get this information are:

Both of these functions fill in a snd_pcm_channel_status_t structure. You'll need to check the following members of this structure:

scount
The hardware play position, in bytes relative to the start of the stream since the last time the channel was prepared. The act of preparing a channel resets this count.
count
The play position, in bytes relative to the total number of bytes written to the device.

Note: The count member isn't used if the mmap plugin is used. To disable the mmap plugin, call snd_pcm_plugin_set_disable().

For example, consider a stream where 1,000,000 bytes have been written to the device. If the status call sets scount to 999,000 and count to 1000, there are 1000 bytes of data in the buffer remaining to be played, and 999,000 bytes of the stream have already been played.

Capturing audio data

Once you've opened and configured a PCM capture device and prepared the PCM subchannel (see Handling PCM devices,” above), you're ready to capture sound data.

There's a complete example of capturing audio data in the waverec.c example in the appendix. You may wish to compile and run the application now, and refer to the running code as you progress through this section.

This section includes:

Selecting what to capture

Most sound cards allow only one analog signal to be connected to the ADC. Therefore, in order to capture audio data, the user or application must select the appropriate input source. Some sound cards allow multiple signals to be connected to the ADC; in this case, make sure the appropriate signal is one of them. There's an API call, snd_mixer_group_write(), for controlling the mixer so that the application can set this up directly; it's described in the Mixer Architecture chapter. If you're using the waverec.c example, just use the Photon mixer application to select the input.

Capture states

The state diagram for a PCM device during capture is shown below.


PCM states


State diagram for PCM devices during capture.

The transition between SND_PCM_STATUS_* states is the result of executing an API call, or the result of conditions that occur in the hardware:

From To Cause
NOTREADY READY Calling snd_pcm_channel_params() or snd_pcm_plugin_params()
READY PREPARED Calling snd_pcm_capture_prepare(), snd_pcm_channel_prepare(), or snd_pcm_plugin_prepare()
PREPARED RUNNING Calling snd_pcm_read() or snd_pcm_plugin_read(), or calling select() against the capture file descriptors
RUNNING PAUSED Calling snd_pcm_capture_pause() or snd_pcm_channel_pause()
PAUSED RUNNING Calling snd_pcm_capture_resume() or snd_pcm_channel_resume()
RUNNING OVERRUN The hardware buffer became full during capture; snd_pcm_read() and snd_pcm_plugin_read() fail
RUNNING CHANGE The stream changed
RUNNING ERROR A hardware error occurred
OVERRUN, CHANGE, or ERROR PREPARED Calling snd_pcm_capture_prepare(), snd_pcm_channel_prepare(), or snd_pcm_plugin_prepare()

For more details on these transitions, see the description of each function in the Audio Library chapter.

Receiving data from the PCM subchannel

You can receive data from the subchannel by calling either one of the following, depending on whether or not you're using plugin converters:

snd_pcm_read()
The number of bytes read must be a multiple of the fragment size, or the read fails.
snd_pcm_plugin_read()
The plugin reads an entire fragment from the driver and then fulfills requests for partial reads from that buffer until another full fragment has to be read.

A full nonblocking read mode is supported if the application can't afford to be blocked on the PCM subchannel. You can enable nonblocking mode when you open the handle or by using the snd_pcm_nonblock_mode() API call.


Note: This approach results in a polled operation mode that isn't recommended.

Another method that your application can use to avoid blocking on the read is to use select() (see the QNX Neutrino Library Reference) to wait until the PCM subchannel has more data. This is the technique that the waverec.c example uses. It allows the program to wait on user input while at the same time receiving the capture data from the PCM subchannel.

To get the file descriptor to pass to select(), call snd_pcm_file_descriptor().


Note: With this technique, select() returns when there are frag_size bytes in the subchannel. If your application tries to read more data than this, it may block on the call.

If the PCM subchannel stops during capture

When capturing, the PCM subchannel stops if the hardware has no room for additional data left in its buffer. This can happen if the application can't consume data at the rate that the hardware is producing data. A real-world example of this is when the application is preempted for a period of time by a higher-priority process. If this preemption continues long enough, the data buffer may be filled before the application can remove any data.

When this happens, the subchannel changes state to SND_PCM_STATUS_OVERRUN. In this state, it won't provide any more data (i.e., snd_pcm_read() and snd_pcm_plugin_read() fail) and the subchannel doesn't restart capturing.

The only ways to move out of this state are to close the subchannel or to reprepare the channel as you did before. This forces the application to recognize and take action to get out of the overrun state; this is primarily for applications that want to synchronize audio with something else. Consider the difficulties involved with synchronization if the subchannel simply were to move back to the SND_PCM_STATUS_RUNNING state from overrun when space became available; the recorded sample would be discontinuous.

Stopping the capture

If your application wishes to stop capturing, it can simply stop reading data and let the subchannel overrun as described above, but there's a better way.

If you want your application to stop capturing immediately and delete any unread data from the hardware buffer, call one the flush functions:

Synchronizing with the PCM subchannel

QSA provides some basic synchronization functionality: an application can find out where in the stream the hardware capture position is. The resolution of this position is entirely a function of the hardware driver; consult the specific device driver documentation for details if this is important to your application.

The API calls to get this information are:

Both of these functions fill in a snd_pcm_channel_status_t structure. You'll need to check the following members of this structure:

scount
The hardware capture position, in bytes relative to the start of the stream since you last prepared the channel. The act of preparing a channel resets this count.
count
The capture position as bytes in the hardware buffer.

Note: The count member isn't used if the mmap plugin is used. To disable the mmap plugin, call snd_pcm_plugin_set_disable().