PgConf.Russia 2017 talks

Accelerating queries of set data types with GIN, GiST, and custom indexing extensions

Markus Nullmeier
University of Heidelberg, software developer

Markus Nullmeier has been working in diverse fields such as low-temperature physics and image processing, as well as numerical simulation of combustion, polymerisation, and fluid dynamics. Nowadays, he gets into back-end matters of astronomical databases running on PostgreSQL, working in the context of the International Virtual Observatory Alliance. With respect to PostgreSQL, his main interests are custom data types and their indexing.

Sets are apparently a useful data type for many kinds of applications. While PostgreSQL offers no built-in set data type, sets may be emulated to some degree with its built-in array and JSONB data types. Also, acceleration of respective containment (subset) queries is readily available as a built-in feature of the GIN index type.

Starting with the above, we will then explore the performance gains enabled by custom set data types, and especially by customisation code in C ("operator classes") for the GIN and GiST index types. Then, we will introduce a specialised set data type implementation, where subsequent integers are compressed via run-length encoding, being motivated by spatial tessellations used in astronomy. There, neither GIN nor GiST as such may be used for efficient indexing. However, since PostgreSQL 9.6, it is possible to write your own full-featured index types, which may be plugged into the system as a regular PostgreSQL extension.

One such extension is RUM from Postgres Professional -- an improved implementation of GIN. We will present our modifications to RUM, called "OUZO", where the handling of the key (i. e., set element) data type allows for run-length encoding. This enables useful indexing of the above compressed-set data type. We will also discuss the property of GIN, RUM, and OUZO (for the time being) that key items are never deleted, in view of its performance implications for OUZO.