title

text

Aleksander Kuzmenkov
Aleksander Kuzmenkov Postgres Professional
: December
45 мин

Towards more efficient query plans: PostgreSQL 11 and beyond

A major responsibility of a database engine is to convert a declarative SQL query to an efficient execution plan, employing various methods to scan and join the relations. There is always a development effort to improve this area. What clever execution plans can PostgreSQL generate, what's new in version 11 and what is in development? To name a few things, the joins are optimized by removing unneeded outer and inner joins, and reducing joins from outer and semi to inner. There is work to enable merge joins on inequality and range overlap, and to improve join selectivity estimates with multi-column statistics. When it comes to scanning a single relation, covering indexes allow to use index-only scans more often. Incremental sort and more precise estimation of sorting costs help generate better paths when sorted output is required, e.g. when using GROUP BY and ORDER BY or performing merge joins. This talk aims to give an overview of such optimizations that already exist and that are being developed now.

Слайды

Видео

Другие доклады

  • Alexander Kukushkin
    Alexander Kukushkin Zalando SE
    45 мин

    How not to screw up when building HA cluster

    You just set up your first PostgreSQL cluster, created a database schema, loaded some data, did some fine tuning of configuration. Now you want to make your cluster highly available. Unfortunately, PostgreSQL doesn't offer built-in automatic failover, but luckily for us, there are plenty of external tools for that. As a next logical step you start choosing a tool, and... you already doing it wrong, because first you have to define SLA, RTO, and RPO. In this talk I am going to cover most of the common mistakes people do when setting up a highly available cluster.

  • Miroslav Šedivý
    Miroslav Šedivý solute GmbH
    45 мин

    Bitemporality: Tracking Reproducible Revisions in PostgreSQL Using RANGE Types

    So you finally have your database model for your application and you fill it in with current data. How do you keep it up to date? While INSERT may still be transparent, UPDATE and DELETE will overwrite your previous data, so you won't be able to reproduce them. Cloning the whole huge content for each minor update is not an option. For rich and complex data about hundreds of thousands of power generators in Germany and worldwide, I built a model using range data types in recent PostgreSQL which allows me to insert, update and delete data while granting the full access to the whole state of the database at any historical moment. I'll present a very simplified version of the database so the audience will be immediately able to apply it for their cases. I'll also show a few tricks in Python and Psycopg2 that will allow a whole team to prepare, review, and deploy all revisions to this database without merge conflicts. And I'll give a few ideas on how to retrieve this data efficiently.

  • Esteban Zimányi
    Esteban Zimányi ULB
    45 мин

    MobilityDB: A PostgreSQL extension for mobility data management

    We will be presenting MobilityDB, a PostgreSQL extension that extends the type system of PostgreSQL and PostGIS with abstract data types for representing moving object data. These types can represent the evolution on time of values of some element type, called the base type of the temporal type. For instance, temporal integers may be used to represent the evolution on time of the number of employees of a department. In this case, the data type is “temporal integer” and “integer” is the base type. Similarly, a temporal float may be used to represent the evolution on time of the temperature of a room. As another example, a temporal point may be used to represent the evolution on time of the location of a car, as reported by GPS devices. Temporal types are useful because representing values that evolve in time is essential in many applications, for example in mobility applications.

    The temporal types in MobilityDB are based on the bool, int, float, and text base types provided by PostgreSQL, and on the geometry and geography base types provided by PostGIS (restricted to 2D or 3D points). MobilityDB follows the ongoing OGC standards on Moving Features (http://www.opengeospatial.org/standards/movingfeatures), and in particular the OGC Moving Features Access, which specifies operations that can be applied to time-varying geometries.

    A rich set of functions and operators is available to perform various operations on temporal types. In general there are three classes:

    • Lifed functions and operators: the operators on the base types (such as arithmetic operators and for integers and floats, spatial relationships and distance for geometries) are intuitively generalized when the values evolve in time. Spatiotemporal functions in MobilityDB generalize spatial functions provided by PostGIS for both "geometry" and "geography" types, for instance the "ST_Intersection". Basically, MobilityDB takes care of the temporal aspects and delegates the spatial processing to PostGIS.
    • Temporal functions and operators: they process the temporal dimension of the value which can be an instant, a range, an array of instant, or an array of ranges. Examples are the atperiods function that restricts a temporal type to a given array of time ranges, and the duration function that extracts the definition time of a value.
    • Spatiotemporal functions and operators: all remaining functions fall into this category. Examples are speed(tgeompoint/tgeogpoint), azimuth(tgeompoint/tgeogpoint), maxValue(tfloat/tint), twAvg(tfloat) a time weighted average, etc.

    Both GiST and SP-GiST have been extended to support the temporal types. The GiST index implements an R-tree for temporal alphanumeric types and a TB-tree for temporal point types. The SP-GiST index implements a Quad-tree for temporal alphanumeric types and an Oct-tree for temporal point types. The approach used for developing SP-GIST indexes for MobilityDB allowed us to add SP-GIST indexes for 2-dimensional, 3-dimensional and n-dimensional geometries in PostGIS.

    Two types of numeric aggregate functions are available. In addition to the traditional functions min, max, count, sum, and avg, there are window (also known as cumulative) versions of them. Given a time interval w, the window aggregate functions compute the value of the function at an instant t by considering the values during the interval [t − w, t]. In contrast to standard aggregation, temporal aggregation may return a result which is of a bigger size than the input. For this reason, the temporal aggregate functions have been extremely optimized in order to perform efficiently.

    MobilityDB has a preliminary implementation of the statistic collector functions and the selectivity functions for the temporal types.

    In terms of size, the extension is made of 67k lines of C code, 19k lines of SQL code, 67k lines of SQL unit tests. It defines 40 types, 2300 functions, and 1350 operators.

    The talk will illustrate the spatiotemporal concepts and the data model of the temporal types. It will briefly describe the components of MobilityDB: indexing, aggregations, functions and operators, and the SQL interface. Query examples and uses cases will be illustrated allover the talk. The current status of MobilityDB and the planned development will also be presented.

    The talk shall be given by: - Esteban Zimányi: Professor and Director of the Department of Computer and Decision Engineering of the Universite Libre de Bruxelles. - Mahmoud SAKR: Postdoc researcher in the Universite Libre de Bruxelles.

  • T
    Tatsuro Yamada NTT Comware
    22 мин

    Auto plan tuning using feedback loop

    As is often seen in OLAP and batch processing workloads, the more complex a query (containing many joins, filters, aggregates), the more there is a possibility of row count estimation errors, which leads to planner choosing an inefficient execution plan.

    To address that problem, I developed a tool called pg_plan_advsr as a PostgreSQL extension, which corrects the estimation errors by repeatedly feeding back the information collected during query execution to the planner.

    The tool has three features:

    1. Automatic plan tuning by repeatedly feeding execution information to planner
    2. Preserve all plans generated during plan tuning in a history table
    3. Create and store optimizer hints to be able to reproduce plans generated during tuning process

    I verified the effectiveness of pg_plan_advsr by enabling it when running the join order benchmark (JOB) against PG 10.4 and observed its execution time shortening to 50% of the original. Therefore, it is useful for user who would like to do plan tuning for OLAP and batch processing.

    I will talk about the following things in this presentation:

    • Principles behind pg_plan_advsr and its architecture
    • Detailed information about the measurements done with JOB
    • Possible future enhancements
    • Using aqo and pg_plan_advsr together (experimental)