Collation routines

Updated: April 19, 2023

Collation routines order the results from a SELECT statement. You can define your own routines and tell QDB to use them by providing the COLLATE keyword in the ORDER BY clause.

In your code, you must define a collation routine by providing both a setup function and a sorting function. You must also define a qdb_collation structure that references these functions. Consider this sample collation that provides a setup function that allows the caller to reverse the sorting order (by passing in the string "reverse"), and then uses memcmp() in its sorting function:

typedef struct my_sort_cfg_s {
    unsigned rev_sort;
} my_sort_cfg_t;

static ssize_t
my_sort_cfg_sort(void *arg, int l1, const void *s1, 
                            int l2, const void *s2)
{
    my_sort_cfg_t *my_sort_cfg = (my_sort_cfg_t *)arg;
    int ret = memcmp(s1, s2, min(l1, l2));
    return my_sort_cfg->rev_sort?-ret:ret;
}

static int 
my_sort_cfg_setup(void *arg, const void *data, 
                  int nbytes, char **errmsg)
{
    my_sort_cfg_t *my_sort_cfg = (my_sort_cfg_t *)arg;
    int  ret=EOK;
    char *buf;

    if (!nbytes) {
        return ret;
    }
    buf=alloca(nbytes+1);
    if (!buf) {
        return ret;
    }

    snprintf(buf, nbytes+1, "%s", (char*)data);
    if (strcmp(buf, "reverse")) {
        my_sort_cfg->rev_sort=1;
    }
    return(ret);
}

struct qdb_collation my_sort_runtime_cfg = { 
    .name="my_sort_runtime_cfg", 
    .encoding=SQLITE_UTF8, 
    .arg=&my_sort_cfg_data, 
    .compare=my_sort_cfg_sort, 
    .setup=my_sort_cfg_setup 
};

In this case, the tag value for the structure is my_sort_runtime_cfg, the collation name as visible to SQL is also my_sort_runtime_cfg, the C function used for setup is my_sort_cfg_setup(), and the C function used for comparison is my_sort_cfg_sort(). Each function has the my_sort_cfg_data value passed in as its arg value. For more information on defining SQLite collation sequences, see the SQLite docs on sqlite3_create_collation().

You can define multiple collation sequences (in the same or different DLLs), but each must have a Collation entry in the configuration object as well as a struct qdb_collation object with a unique name describing the routine.

The qdb_collation structure has these members:

struct qdb_collation {
    char  *name;
    int   encoding;
    void  *arg;
    int   (*compare)(void *, int, const void *, int, const void *);
    int   (*setup)(void *, const void *, int, char **);
};
name
The name used for this function in SQL statements. This is limited to 255 bytes, exclusive of the null-terminator, and it can't contain any special tokens or start with a digit. Any attempt to create a function with an invalid name results in an SQLITE_ERROR error.
encoding
The character encoding of strings passed to your function. Can be one of:
  • SQLITE_UTF8
  • SQLITE_UTF16
  • SQLITE_UTF16BE
  • SQLITE_UTF16LE
arg
An arbitrary pointer to user data that is passed as the first argument to your function each time it's invoked.
compare
A pointer to your comparison function.
This entry must not be NULL because all collations need a function that tells them how to compare two items. The signature of this function looks like this:
ssize_t sort(void *arg, int l1, const void *s1, 
                        int l2, const void *s2);
The function parameters are:
arg
The context pointer. This is copied from arg in the qdb_collation structure.
l1
The size of the first data item, in bytes.
s1
A pointer to the first data item.
l2
The size of the second data item, in bytes.
s2
A pointer to the second data item.
setup
A pointer to a setup function that allows you to configure the sorting order at runtime.
If a collation has no dynamic configuration, you can specify NULL for its setup entry. Note that this entry can't be changed at runtime. When the entry is non-NULL, the referenced function is invoked at startup and at each qdb_collation() call. The signature of the setup function looks like this:
int (*setup)(void *arg, const void *data, int nbytes, char **errmsg);
The function parameters are:
void *arg
The context pointer. This is copied from the arg field in the qdb_collation structure.
const void *data
The data used to configure the sorting. When the routine is invoked at startup, this value is NULL. At runtime, it refers to the data provided to the qdb_collation() function. Note that the DLL must cooperate with the caller to exchange data of a known format.
int nbytes
The data size, in bytes. At startup, this value is 0. At runtime, it contains the size of the data given to qdb_collation().
char **errmsg
A pointer to an error message string that is available to qdb_geterrmsg() and is displayed on failure. At startup, QDB fails the setup function; at runtime, QDB fails qdb_collation() and makes this string available through qdb_geterrmsg().
The function returns either EOK (if it succeeds) or a POSIX errno value (if it fails).