Capability ID 0x11 (MSI-X)

Updated: May 06, 2022

The MSI-X capability module enables the use of Extended Message Signalled Interrupts. You should configure the MSI-X capability with the APIs listed below before enabling it. Specifically, you should check the number of device-supported interrupts, and then select the number that you intend to support in the driver. The number of driver-supported interrupts must be less than or equal to the number of device-supported interrupts.

You must enable the MSI-X capability (see pci_device_cfg_cap_enable()) before you call pci_device_read_irq(). When the capability is enabled, bit 2 (Bus Master) of the Command Register is also set. This bit isn't automatically cleared when the capability is disabled.

The MSI and MSI-X capabilities are mutually exclusive. If you want to switch to using MSI-X on a device that has MSI enabled, you must first disable the MSI capability (see pci_device_cfg_cap_disable()). Failure to do this results in an error of PCI_ERR_MSI_ENABLED.

The MSI-X capability supports the following APIs as defined in <pci/cap_msix.h>.

The following APIs allow software to obtain the number of device-supported interrupt sources and to specify the number to actually use. With MSI-X, the ability to set the disposition of an interrupt source is also possible. Disposition changes take effect when the capability is enabled, and are discussed in more detail below.

uint_t cap_msix_get_nirq(pci_cap_t cap)
uint_t cap_msix_get_nirq_isr(pci_cap_t cap)
Returns the number of interrupts supported by the device, or 0 on any error. You can call the _isr-suffixed version from an ISR.
pci_err_t cap_msix_set_nirq(pci_devhdl_t hdl, pci_cap_t cap, uint_t nirq)
pci_err_t cap_msix_set_irq_entry(pci_devhdl_t hdl, pci_cap_t cap, uint_t irq_entry, int_t disposition)
The cap_msix_set_nirq() function doesn't have the same utility as it does for MSI. Instead, you should use cap_msix_set_irq_entry() to modify the number of independent IRQs that the device will use. See the “Interrupt disposition” discussion below.

The following API obtains a read-only pointer to the Pending Bits Array (PBA):

cap_msix_pba_t *cap_msix_get_pba_ptr( pci_devhdl_t hdl, pci_cap_t cap )

The following APIs let you set or clear interrupt masks for each device interrupt source. ISR-safe versions of these functions are provided; they function identically but are suffixed with _isr:

pci_err_t cap_msix_mask_irq_entry( pci_devhdl_t hdl, pci_cap_t cap, uint_t irq_entry) 
pci_err_t cap_msix_mask_irq_entry_isr( pci_devhdl_t hdl, pci_cap_t cap, uint_t irq_entry) 

pci_err_t cap_msix_unmask_irq_entry ( pci_devhdl_t hdl, pci_cap_t cap, uint_t irq_entry) 
pci_err_t cap_msix_unmask_irq_entry_isr( pci_devhdl_t hdl, pci_cap_t cap, uint_t irq_entry) 

In addition, you can mask or unmask all interrupts using the MSI-X defined “function mask” control. These functions don't affect individual entry masks, but rather set or clear the function mask bit 14 of the message control register. They return PCI_ERR_EALREADY if the requested state is already set, so that you have an indication of the previous state. Any value other than PCI_ERR_OK indicates that the operation couldn't be performed. The ISR-safe versions of these functions are suffixed with _isr:

pci_err_t cap_msix_mask_irq_all( pci_devhdl_t hdl, pci_cap_t cap) 
pci_err_t cap_msix_mask_irq_all_isr( pci_devhdl_t hdl, pci_cap_t cap) 

pci_err_t cap_msix_unmask_irq_all( pci_devhdl_t hdl, pci_cap_t cap) 
pci_err_t cap_msix_unmask_irq_all_isr( pci_devhdl_t hdl, pci_cap_t cap) 

The cap_msix_mask_irq_entry() and cap_msix_unmask_irq_entry() APIs allow driver software to mask and unmask each of the supported device interrupt sources. There's no direct relationship between a device's interrupt entry or the assigned MSI-X vector and the assigned IRQs returned from pci_device_read_irq(). InterruptMask() and InterruptUnmask() mask and unmask an IRQ, but cap_msix_mask_irq_entry() and cap_msix_unmask_irq_entry() (if supported) mask and unmask the interrupt sources associated with the MSI-X interrupt entry. Some or all of these entries may be configured to share the same IRQ. You can obtain the number of interrupt sources from cap_msix_get_nirq().

Interrupt disposition

The cap_msix_set_irq_entry() function allows driver software to control which MSI-X entries are unused and which are shared. By default there's no sharing. You can use this feature to:

By default, and unless unmodified by calling cap_msix_set_irq_entry(), each device interrupt source is assigned a unique MSI-X vector—notwithstanding the availability of MSI-X vectors when the capability is enabled—and hence a unique IRQ. Although a unique IRQ is assigned, this doesn't necessarily mean the IRQ isn't shared from a system perspective. This depends on the interrupt controller, and hence is platform-dependent.

This configuration is established when you initially read the capability with pci_device_read_cap(). If software modifies the disposition of one or more interrupt sources and then enables the capability, the disposition of each source will be as configured. If you subsequently disable the capability and then reenable it, the disposition is maintained. If you want to reset the disposition, you can either:

You can set the disposition of an interrupt source by specifying the irq_entry to be operated on, along with a disposition parameter:

Interrupt entries marked as unused (-1) are masked and can't be unmasked. Otherwise the mask state of the entry isn't altered, but the entry will have been masked when the capability was disabled in order to make disposition changes.

When you're setting the disposition of interrupt sources to shared, the disposition argument must always be less than or equal to irq_entry. That is, if you wish interrupt sources 2 and 4 to share the same MSI-X vector and hence IRQ, the calls to establish this should be as follows (error handling omitted):

cap_msix_set_irq_entry(hdl, msix_cap, 2, 2);
cap_msix_set_irq_entry(hdl, msix_cap, 4, 2);

Attempts to do the reverse are rejected.

When you call pci_device_read_irq() to retrieve the list of assigned IRQs, the list is ordered to correspond with each of the n entries (starting from 0) that aren't marked as unused. For example, suppose that a device supports 256 MSI-X interrupts as identified by cap_msix_get_nirq(). Then suppose that only 64 interrupts were allocated, as identified by pci_device_read_irq(). This could be because of system limitations or configuration, or because the driver chose to use only this many by setting the disposition of various sources to shared. In this case, each of the 64 IRQs returned (0 through 63) correspond to MSI-X entries 0 through 63. If the number of requested and allocated IRQs is the same as the number of supported IRQs, this 1:1 relationship is exactly as you'd expect.

If, prior to calling pci_device_cfg_cap_enable(), the driver called cap_msix_set_irq_entry() to mark entries 0, 5, 6 as unused, entries 13 and 14 as shared, and 22 and 23 as shared, then the following IRQ to entry assignments will exist after a successful call to pci_device_cfg_cap_enable():


unused --> entry 0
irq[0] --> IRQ for entry 1
irq[1..3] --> IRQ for entries 2, 3, 4 respectively
unused --> entries 5 and 6
irq[4..9] --> IRQ for entries 7, 8, 9, 10, 11, 12 respectively
irq[10] --> IRQ for entries 13/14 (shared)
irq[11..17] --> IRQ for entries 15, 16, 17, 18, 19, 20, 21 respectively
irq[18] --> IRQ for entries 22/23 (shared)
irq[19..63] --> IRQ for entries 24 through 68 respectively

The cap_msix_get_pba_ptr() function returns a read-only pointer to the Pending Bits Array. The PBA allows driver software to query any pending interrupts. If an error occurred, the function returns NULL. The PBA is organized with bit 0 corresponding to interrupt source/entry 0, and bit n corresponding to the number of supported MSI-X interrupts the device supports minus 1.

When you disable the MSI-X capability, all interrupt sources are automatically masked. You must disable the capability before making any changes to the interrupt disposition.

When you enable the MSI-X capability with pci_device_cfg_cap_enable(), the reqType parameter has the following meanings for the specific capability APIs. Any capability module APIs not listed here aren't affected by the reqType.

cap_msix_set_irq_entry()
  • pci_reqType_e_MANDATORY — a unique MSI-X vector and hence IRQ must be allocated for each of the interrupt sources not marked as unused, or the capability won't be enabled, and pci_device_cfg_cap_enable() fails with PCI_ERR_CAP_NIRQ.
  • pci_reqType_e_ADVISORY — an attempt to satisfy the required number of IRQs will be made, however a lower number may be assigned down to the minimum required. The pci_device_cfg_cap_enable() call may fail with PCI_ERR_IRQ_NOT_AVAIL if this condition can't be met.
  • pci_reqType_e_UNSPECIFIED — behaves the same as pci_reqType_e_ADVISORY.

Errors that may be returned by pci_device_cfg_cap_enable() from the MSI-X capability module include:

A note about the ISR-safe functions

In certain circumstances, a device being managed by driver software may become temporarily unavailable. This may be due to a bus segment reconfiguration or reset, or a hot plug removal. Normally these events are conveyed to the driver prior to their occurrence, but it is possible that previously initiated transactions may result in a interrupt that results in the execution of the ISR functions. In this circumstance, the _isr-suffixed calls return PCI_ERR_DEV_NOT_AVAIL, indicating the temporary unavailability of the device. The ISR must detect this specific error (in addition to any others) and handle it accordingly. Access to the device's registers isn't possible in this condition.