Talks and tutorials
When working with PostgreSQL, we regularly encounter queries that take suspiciously long to be executed. What if it's too late to enable statistics or auto explain, and cancellation of the query doesn't look sensible, but you still need to find out what caused the issue or see at what stage execution the query is? pg_query_state is an open source PostgreSQL extension, which can give you access to the query execution plan in real time. It also makes available statistics for each node, including the number of processed rows.
When PostgreSQL was at its formation stage, its transaction ID was chosen to be 32-bit. Back then, no one could imagine that someday we may need more than 4 billion transactions. However, ongoing technical progress and digitization pushed some Postgres instances towards their transaction ID limit. The Postgres community reacted to this with a wraparound of the transaction ID counter. However, constantly growing data volumes exposed PostgreSQL to new challenges. In my presentation, I will cover these challenges and explain how they can be solved with a 64-bit transaction ID, what the consequences will be like, why now it is a good time to implement 64x IDs, and why this hasn't been done previously.
Optimisation of queries can have a significant impact on database performance. In my talk, I will review the tools that we use in our practice and discuss some use-cases. This talk is of interest to database administrators or developers who would like to optimise their database performance and learn new query optimisation techniques.
In May 2019, Postgres Professional launched the PostgreSQL certification program. I have been working in this company since March 2020, and in a year I have successfully taken four tests on the DBA1, DBA2, DBA 3, QPT courses. In this talk, I will share my experience of preparing for the exams and passing these tests.
Mamonsu is an active monitoring agent for PostgreSQL based on Zabbix. Mamonsu 3.0 Major Update includes new metrics and more possibilities for data visualization via Zabbix screens. It also introduces the improved functionality of Mamonsu tools for interacting with the DBMS. In addition to the new functionality overview, I will also share the nuances of upgrading to the 3.0 version.
What can be more important in the data load process than speed? Data migration is one of the most requested features in DBeaver. So the performance issue was highly important for us, in regard to PostgreSQL, as well as Greenplum, Redshift and other Postgres-based databases. We are ready to share our tiny secrets about 10x, 100x, 1000x, and even 10,000x performance improvements for data imports without any magic.
This talk is about postgres extension pg_profile - simple historic database workload profiler. I'll talk about its new features and new statistics available in Postgres 14. There is a branch of pg_profile named pgpro_pwr, designed to run in PostgresPro Enterprise Edition and PostgresPro Standard Edition databases. It is using extended performance statistics of those databases providing some valuable benefits. We'll see what's new in pgpro_pwr extension. Also I'll talk about some problems in postgres performance monitoring and their possible solutions in the future.
Everyone knows very well what a failover cluster PostgreSQL is and how such a cluster protects against failures within the same data center. However, recently, more and more enterprises have placed increased demands on their services, these requirements include disaster tolerance. We call such clusters a GEO-Cluster (KUK). In the report, I will talk about the varieties, principles and approaches to building GEO-Clusters PostgreSQL based on the Corosync/Pacemaker cluster software.
We are surrounded by myths, so we are constantly under delusion in regard to politics, economy, history and... space. Let's tackle the most pressing issues related to astronomy and space research. Perhaps we will start to better understand the world around us, avoid exaggerated sensations and spot really interesting and attractive aspects of scientific research.
PostgreSQL could easily be named the master of indexes - there is no other database that could compete with Postgres for the variety of indexes. How developers can use this advantage effectively in their day to day work? How do you know which index to create? When the created index might slow our application’s performance? We will try to get our head around this topic with some use cases and live examples. This talk would be of interest for any developer who wants to expand their knowledge of indexes in Postgres and secure the best performance for their application.
Narrow way to four nines. Postgresql clusterin in virtual environment - why's and how's. Why Corosync/Pacemaker? What about backups? Difficulties, problems and how to avoid them.
Postgres has been a vibrant project for decades, and probably will be popular for decades to come. However, as with any complex endeavor, challenges are always lurking. This talk explores the many project, competitive, and technical challenges in the future that could derail its success. By exploring these challenges, we will be better able to avoid them.
Finding the nearest neighbor can be required for various tasks. For example, when you need to find the closest object to a given point on the map. This task looks trivial to non-programmer (a person can easily cope with it if they have a map). In a software developer's reality, this task doesn't have a common solution available to everyone. To get rid of this headache, programmers often create ad hoc solutions also known as "crutches". These workarounds don't look nice and often ruin the mood of a creative programmer who needs to go to a beer pub to cope with the cognitive dissonance :)
Indeed, while a person has a typical field of view and a map with a certain scale, the programmer has only one given point and a huge number of other points (i.e. billions of stars). This multitude of points gets a lot of incoming requests, including the write requests, not just read ones. You can write a perfect query in SQL, however, the real-world query execution plan will be depressingly long. To find the closest neighbor, you will have to read the entire table, compute all the distances from the given point and return the given number of good enough results. Indexing doesn't help in this case, as you will have to fully scan the search tree and read the entire table in random order. This will take much longer than simple table reading. In reality, tasks, where you need to efficiently find nearest neighbors, aren't limited to spatial search. It can also be used for classification tasks, finding typos, data clustering, and deduplication. All such tasks will benefit from efficient nearest neighbor search in DBMSs that are now a de facto standard for storing the data. What do we mean by "efficient search"? It means that our search is fast, concurrent, scalable, and supports various data types (most likely, non-standard ones). We implemented such KNN search in PostgreSQL 11 years ago. I will cover its implementation, today's state and share some use cases for KNN.
JSON is now the de facto standard for startup developers. Why is this happening and what should be done? Should we teach application developers how to properly design a database according to the canons of relational theory (which Postgres complies very well with) or make the DBMS more JSON-friendly?
TimescaleDB extension allows to turn good old Postgres into a real distributed cluster for storing time series data while maintaining the relational model, convenient SQL and a time-tested ecosystem. And additional features such as continuous materialized views and data compression allow to build truly powerful telematic hubs.
Data is getting bigger, disks are getting faster, the DBMS optimizer is getting smarter, but the problem of " bloating " remains extremely relevant. I want to share my experience and approach to dealing with this effect on a large number of tables and data in them.
The development of secure system software (like an operating system or a DBMS) and ensuring trust in it are complicated scientific and technical tasks. In our presentation, we will demonstrate how we resolved these tasks for the Astra Linux OS certified to comply with the highest security classes. We will discuss the key directions of our activities, from the formation of national standards to the development and verification of the formal access model, which is the basis for the attack surface protection mechanism on the OS's level. We will also share how we apply the methods and technologies of dynamic and static analysis of the program code and, finally, we will explain how we collect and analyze the data we get from analyzing the OS code. We will also discuss how to fix the code errors in the project with continuous integration.
In OZON, we work on PostgreSQL as a service. Currently, we have thousands of clusters. We know about the important questions software developers ask, and the most crucial ones are as follows: how to understand the database's workload and why this query is getting slow? In this presentation, I will explain how we help our developers answer these questions by simply having a look at our dashboard.
We'll discuss our approaches to picking technical solutions for the systems we design. We'll consider their advantages and limitations. We'll share the facts about the changes in our engineering culture when import substitution requirements came into effect. We migrated the system with a classical three-tier architecture from Oracle to Postgres Pro DBMS. In my presentation I'll pay attention to data migration tools and peculiarities of development Java/Kotlin apps and SQL interoperability between two DBMSs.
PostgreSQL protocol compression has been discussed in the community for a long time. Many different hypotheses have been expressed and tested, and the compression patch has received a large number of changes and improvements. I will review the various approaches tested during the implementation of protocol compression, and will highlight the current status.
Multiple data leaks can occur during data migration. How to protect your sensitive data and stay compliant with the requirements of regulatory institutions? This is what you will learn from my presentation.
In 2019 we migrated distributed railway application from Oracle 11g SE to vanilla PostgreSQL 11.9. Almost 2 years have passed, the system is working good. The report focuses on how we migrated, what problems we faced during this migration and after migration, as well as what we would have done differently today experience.
Running PostgreSQL in Kubernetes became a common theme with the rise of the operators. In this talk, we will show you how easy it is to deploy a highly available PostgreSQL cluster in Kubernetes with Operator. We will also show how to prepare for the failure of a data center by running a hot-standby cluster on a DR site.
Creation of a completely import-substituted component of the state information system using the example of the PUR KS - the "Electronic Budget" subsystem, the key state integrated information system for managing public finances. The presentation presents the subsystem architecture, performance optimization history and monitoring description. The speaker will talk about the advantages and difficulties of using an import-substituted system for the public sector, as well as about the nuances that departments should take into account when implementing open source solutions and solutions from Russian developers.
PostgreSQL maintenance without backups is unacceptable. The constant growth of PostgreSQL clusters in maintenance creates new issues and bottlenecks for the chosen backup scenario. In this presentation, we will explain how to efficiently use pg_probackup under such circumstances.
In this presentation, we'll reveal the process and peculiarities of our Oracle to PostgreSQL migration project. We have moved our ELDOKA corporate ECM platform to PostgreSQL, and during this talk we will explain how we ensured object-based and role-based access to data, which features were missing in the community-developed PostgreSQL, how we organized our work with spatial data and changed approach to storing our file content. We will also share our experience in time and resource savings, as well as data replication between our nodes. Performance test results will also be shared with the audience.
The pgpro_scheduler extension has an interesting but little-known feature - one-time jobs. Despite its simplicity, this feature can be used for complex transaction processing. On the one hand, it helps to reliably execute tasks taking a very long time, on the other hand, it ensures app scaling if certain conditions are met.
From the very beginning of 2021, I'm fixing a bug in CREATE INDEX CONCURRENTLY. This bug manifests extremely seldom. I could not find any traces of it in mailing lists ever since the introduction of the feature in 8.2. Yet the bug affected our systems, so I decided to fix it. This is how the9i detective bug-hunting story started. In this story, I'll share details of the implementation of transactions and lock mechanics.
Throughout the release cycle of version 14, I had the opportunity to describe the accepted patches in articles dedicated to each of the five release commitfests. The first two articles on Postgres 15 also have some remarks on PostgreSQL 14. If you have read them, you can skip my presentation :-)). On September 30, 2021, PostgreSQL 14 was officially released. Nothing will change in this major release, so it's the right time for a final overview. However, presenting each and every new feature in a 45-min. talk isn't possible. That's why I will be sharing with you my personal top of version 14 innovations.
Crew Resource Management is a field which has become critical in many fields including aviation and nuclear power. However in IT, this field has not been well implemented. In this talk we will discuss what crew resource management is, what kinds of problems it solves, and why all database teams should implement it.
Several PostgreSQL-related case studies will be discussed and both technical and human factors will be part of the case studies.
We maintain large PostgreSQL databases with the total data volume exceeding 180 TB. Every instance in the cluster gets a workload of 15 thousand requests per second. These circumstances certainly impose limitations on the classical approach to updating database structures. The cost of a DBA error gets higher, as even a small error can lead to a sleepless night) In my talk, I will cover our existing limitations for changes in the production environment, non-classical DBMS behavior under heavy workloads, and will discuss PostgreSQL in general.
I will not teach you database design or query optimization. This talk will cover Python drivers, PostgreSQL usage in Python, and various tools from sync and async world enabling such usage, and how this ecosystem evolved. We'll discuss Python libraries including aiopg, psycopg2, asyncpg, pgbouncer and how they interact with Postgres.
The gist is as follows: the JS code that is executed on the DB side can run on the developer's local computer by using node.js, which works on the same v8 interpreter. The plv8.execute special function is replaced by a function from the pg-native npm library, which calls the outer DBMS. I will demonstrate a set of tools chosen by me that enables the following:
- writing JS code in the IDE you prefer and enjoy syntax highlight;
- code debugging in real time (with breakpoint, watch, etc.);
- writing of auto-tests (unit-tests), with a variety of options: Postgres, SQLite, mocks;
- deploying your code in the DBMS;
- usage of additional npm packages (the issue is that all code of the plv8-based function should be included in the function's body, in one file).
You can use this tool regardless of the programming language that you use for the backend. However, it becomes more flexible if you use languages with static typing (like Java, C#, etc.). For the tasks where the backend is an intermediate layer between the frontend and DBMS, logic (or its part) can be placed in plv8/js with dynamic typing, which will simplify the development process.
In addition to the development of new functions on plv8, the framework provides a set of ready-made functions for CRUD operations. These functions are universal, they aren't tied to a certain database structure, and they can work in any project. If you use them, you can do less backend development, in some projects to a very significant extent.
The installation of the plv8 extension is the most complicated part of working with it. However, I have good news: my colleagues prepared Docker files and Docker images for PostgreSQL 13 with pre-installed plv8! So your start with plv8 will be super simple: you need to deploy the Docker container using just one command!
Docker file: PostgreSQL 13 + plv8 v2.13.15
Demo project for you to participate in the tutorial
For the tutorial, you need to install the following:
Node.js (the most desirable is LTS)
IDE for JS (i.e., free Visual Studio Code)
The built-in partitioning mechanism in PostgreSQL has been developing for several years. However, there is no magic button that would transform an ordinary table into a partitioned one. I will explain how to implement partitioning in a production system, how to prepare your table and apps for partitioning, and what pitfalls can make a DBA's life harder. We will also take a deep dive into various techniques of transferring data between partitions, also considering their pros, cons and limitation.