Accelerating queries of set data types with GIN, GiST, and custom indexing extensions
University of Heidelberg, software developer
Markus Nullmeier has been working in diverse fields such as low-temperature physics and image processing, as well as numerical simulation of combustion, polymerisation, and fluid dynamics. Nowadays, he gets into back-end matters of astronomical databases running on PostgreSQL, working in the context of the International Virtual Observatory Alliance. With respect to PostgreSQL, his main interests are custom data types and their indexing.
Sets are apparently a useful data type for many kinds of applications. While PostgreSQL offers no built-in set data type, sets may be emulated to some degree with its built-in array and JSONB data types. Also, acceleration of respective containment (subset) queries is readily available as a built-in feature of the GIN index type.
Starting with the above, we will then explore the performance gains
enabled by custom set data types, and especially by customisation code
in C ("operator classes") for the GIN and GiST index types.
One such extension is RUM from Postgres Professional -- an improved implementation of GIN. We will present our modifications to RUM, called "OUZO", where the handling of the key (i. e., set element) data type allows for run-length encoding. This enables useful indexing of the above compressed-set data type. We will also discuss the property of GIN, RUM, and OUZO (for the time being) that key items are never deleted, in view of its performance implications for OUZO.