Talks
PG_QUERY_STATE: Everything about the inner life of your queries
When working with PostgreSQL, we regularly encounter queries that take suspiciously long to be executed. What if it's too late to enable statistics or auto explain, and cancellation of the query doesn't look sensible, but you still need to find out what caused the issue or see at what stage execution the query is? pg_query_state is an open source PostgreSQL extension, which can give you access to the query execution plan in real time. It also makes available statistics for each node, including the number of processed rows.
Slides
Video
PostgreSQL certification: personal experience of taking four tests
In May 2019, Postgres Professional launched the PostgreSQL certification program. I have been working in this company since March 2020, and in a year I have successfully taken four tests on the DBA1, DBA2, DBA 3, QPT courses. In this talk, I will share my experience of preparing for the exams and passing these tests.
Slides
Video
How to speed up data load up to 10,000 times?
What can be more important in the data load process than speed? Data migration is one of the most requested features in DBeaver. So the performance issue was highly important for us, in regard to PostgreSQL, as well as Greenplum, Redshift and other Postgres-based databases. We are ready to share our tiny secrets about 10x, 100x, 1000x, and even 10,000x performance improvements for data imports without any magic.
Slides
Video
Mamonsu 3.0: New Features
Mamonsu is an active monitoring agent for PostgreSQL based on Zabbix. Mamonsu 3.0 Major Update includes new metrics and more possibilities for data visualization via Zabbix screens. It also introduces the improved functionality of Mamonsu tools for interacting with the DBMS. In addition to the new functionality overview, I will also share the nuances of upgrading to the 3.0 version.
Slides
Working with queries - DBA perspective
Optimisation of queries can have a significant impact on database performance. In my talk, I will review the tools that we use in our practice and discuss some use-cases. This talk is of interest to database administrators or developers who would like to optimise their database performance and learn new query optimisation techniques.
Slides
Video
Why do we need 64-bit transaction IDs?
When PostgreSQL was at its formation stage, its transaction ID was chosen to be 32-bit. Back then, no one could imagine that someday we may need more than 4 billion transactions. However, ongoing technical progress and digitization pushed some Postgres instances towards their transaction ID limit. The Postgres community reacted to this with a wraparound of the transaction ID counter. However, constantly growing data volumes exposed PostgreSQL to new challenges. In my presentation, I will cover these challenges and explain how they can be solved with a 64-bit transaction ID, what the consequences will be like, why now it is a good time to implement 64x IDs, and why this hasn't been done previously.
Slides
Video
PostgreSQL monitoring: sampling plan and average active sessions with your own hands
In OZON, we work on PostgreSQL as a service. Currently, we have thousands of clusters. We know about the important questions software developers ask, and the most crucial ones are as follows: how to understand the database's workload and why this query is getting slow? In this presentation, I will explain how we help our developers answer these questions by simply having a look at our dashboard.
Slides
Video
Future Postgres Challenges
onlinePostgres has been a vibrant project for decades, and probably will be popular for decades to come. However, as with any complex endeavor, challenges are always lurking. This talk explores the many project, competitive, and technical challenges in the future that could derail its success. By exploring these challenges, we will be better able to avoid them.
Slides
Video
Several production cases during one night
We maintain large PostgreSQL databases with the total data volume exceeding 180 TB. Every instance in the cluster gets a workload of 15 thousand requests per second. These circumstances certainly impose limitations on the classical approach to updating database structures. The cost of a DBA error gets higher, as even a small error can lead to a sleepless night) In my talk, I will cover our existing limitations for changes in the production environment, non-classical DBMS behavior under heavy workloads, and will discuss PostgreSQL in general.
Slides
Video
New features of pg_profile/pgpro_pwr - historical workload profiler
This talk is about postgres extension pg_profile - simple historic database workload profiler. I'll talk about its new features and new statistics available in Postgres 14. There is a branch of pg_profile named pgpro_pwr, designed to run in PostgresPro Enterprise Edition and PostgresPro Standard Edition databases. It is using extended performance statistics of those databases providing some valuable benefits. We'll see what's new in pgpro_pwr extension. Also I'll talk about some problems in postgres performance monitoring and their possible solutions in the future.
Slides
Video
Unequal battle with "bloating" in the realities of 1C databases
Data is getting bigger, disks are getting faster, the DBMS optimizer is getting smarter, but the problem of " bloating " remains extremely relevant. I want to share my experience and approach to dealing with this effect on a large number of tables and data in them.
Slides
How to ensure trust in system software using Astra Linux as an example
The development of secure system software (like an operating system or a DBMS) and ensuring trust in it are complicated scientific and technical tasks. In our presentation, we will demonstrate how we resolved these tasks for the Astra Linux OS certified to comply with the highest security classes. We will discuss the key directions of our activities, from the formation of national standards to the development and verification of the formal access model, which is the basis for the attack surface protection mechanism on the OS's level. We will also share how we apply the methods and technologies of dynamic and static analysis of the program code and, finally, we will explain how we collect and analyze the data we get from analyzing the OS code. We will also discuss how to fix the code errors in the project with continuous integration.
Slides
Video
Elegant K-nearest neighbor (KNN) searching in PostgreSQL
Finding the nearest neighbor can be required for various tasks. For example, when you need to find the closest object to a given point on the map. This task looks trivial to non-programmer (a person can easily cope with it if they have a map). In a software developer's reality, this task doesn't have a common solution available to everyone. To get rid of this headache, programmers often create ad hoc solutions also known as "crutches". These workarounds don't look nice and often ruin the mood of a creative programmer who needs to go to a beer pub to cope with the cognitive dissonance :)
Indeed, while a person has a typical field of view and a map with a certain scale, the programmer has only one given point and a huge number of other points (i.e. billions of stars). This multitude of points gets a lot of incoming requests, including the write requests, not just read ones. You can write a perfect query in SQL, however, the real-world query execution plan will be depressingly long. To find the closest neighbor, you will have to read the entire table, compute all the distances from the given point and return the given number of good enough results. Indexing doesn't help in this case, as you will have to fully scan the search tree and read the entire table in random order. This will take much longer than simple table reading. In reality, tasks, where you need to efficiently find nearest neighbors, aren't limited to spatial search. It can also be used for classification tasks, finding typos, data clustering, and deduplication. All such tasks will benefit from efficient nearest neighbor search in DBMSs that are now a de facto standard for storing the data. What do we mean by "efficient search"? It means that our search is fast, concurrent, scalable, and supports various data types (most likely, non-standard ones). We implemented such KNN search in PostgreSQL 11 years ago. I will cover its implementation, today's state and share some use cases for KNN.
Slides
Video
CREATE INDEX CONCURRENTLY implementation details
From the very beginning of 2021, I'm fixing a bug in CREATE INDEX CONCURRENTLY. This bug manifests extremely seldom. I could not find any traces of it in mailing lists ever since the introduction of the feature in 8.2. Yet the bug affected our systems, so I decided to fix it. This is how the9i detective bug-hunting story started. In this story, I'll share details of the implementation of transactions and lock mechanics.
Slides
From Oracle to Postgres using Russian rails. Non-obvious nuances
We'll discuss our approaches to picking technical solutions for the systems we design. We'll consider their advantages and limitations. We'll share the facts about the changes in our engineering culture when import substitution requirements came into effect. We migrated the system with a classical three-tier architecture from Oracle to Postgres Pro DBMS. In my presentation I'll pay attention to data migration tools and peculiarities of development Java/Kotlin apps and SQL interoperability between two DBMSs.
Slides
Video
Using indexes for better application performance
PostgreSQL could easily be named the master of indexes - there is no other database that could compete with Postgres for the variety of indexes. How developers can use this advantage effectively in their day to day work? How do you know which index to create? When the created index might slow our application’s performance? We will try to get our head around this topic with some use cases and live examples. This talk would be of interest for any developer who wants to expand their knowledge of indexes in Postgres and secure the best performance for their application.
Slides
Video
Building a GEO-Cluster
Everyone knows very well what a failover cluster PostgreSQL is and how such a cluster protects against failures within the same data center. However, recently, more and more enterprises have placed increased demands on their services, these requirements include disaster tolerance. We call such clusters a GEO-Cluster (KUK). In the report, I will talk about the varieties, principles and approaches to building GEO-Clusters PostgreSQL based on the Corosync/Pacemaker cluster software.
Slides
Video
JSON or not JSON. Pros and cons of JSON usage
JSON is now the de facto standard for startup developers. Why is this happening and what should be done? Should we teach application developers how to properly design a database according to the canons of relational theory (which Postgres complies very well with) or make the DBMS more JSON-friendly?
Slides
TimescaleDB 2.0 - Time-series data in TimescaleDB distributed cluster on top of PostgreSQL ORDBMS.
TimescaleDB extension allows to turn good old Postgres into a real distributed cluster for storing time series data while maintaining the relational model, convenient SQL and a time-tested ecosystem. And additional features such as continuous materialized views and data compression allow to build truly powerful telematic hubs.
Slides
Video
Basic Astronomy for Database Engineers
We are surrounded by myths, so we are constantly under delusion in regard to politics, economy, history and... space. Let's tackle the most pressing issues related to astronomy and space research. Perhaps we will start to better understand the world around us, avoid exaggerated sensations and spot really interesting and attractive aspects of scientific research.
Slides
Video
An overview of new Intel hardware features
This talk is a brief overview of Intel technologies: CPU development, including features for AI algorithms. The memory speed/volume pyramid > PMem space, encryption development, software tools to improve parallel computing performance. I will also share a couple of cases where PG and Intel were implemented.
Slides
Video
Current status of the PostgreSQL protocol compression
PostgreSQL protocol compression has been discussed in the community for a long time. Many different hypotheses have been expressed and tested, and the compression patch has received a large number of changes and improvements. I will review the various approaches tested during the implementation of protocol compression, and will highlight the current status.
Slides
Video
Peculiarities of ECM platform migration to PostgreSQL
In this presentation, we'll reveal the process and peculiarities of our Oracle to PostgreSQL migration project. We have moved our ELDOKA corporate ECM platform to PostgreSQL, and during this talk we will explain how we ensured object-based and role-based access to data, which features were missing in the community-developed PostgreSQL, how we organized our work with spatial data and changed approach to storing our file content. We will also share our experience in time and resource savings, as well as data replication between our nodes. Performance test results will also be shared with the audience.
Slides
Video
A subsystem of Russia's "Electronic Budget" on fully domestic software
Creation of a completely import-substituted component of the state information system using the example of the PUR KS - the "Electronic Budget" subsystem, the key state integrated information system for managing public finances. The presentation presents the subsystem architecture, performance optimization history and monitoring description. The speaker will talk about the advantages and difficulties of using an import-substituted system for the public sector, as well as about the nuances that departments should take into account when implementing open source solutions and solutions from Russian developers.
Slides
Video
PostgreSQL 14. The Final Overview
Throughout the release cycle of version 14, I had the opportunity to describe the accepted patches in articles dedicated to each of the five release commitfests. The first two articles on Postgres 15 also have some remarks on PostgreSQL 14. If you have read them, you can skip my presentation :-)). On September 30, 2021, PostgreSQL 14 was officially released. Nothing will change in this major release, so it's the right time for a final overview. However, presenting each and every new feature in a 45-min. talk isn't possible. That's why I will be sharing with you my personal top of version 14 innovations.
Slides
Video
PostgreSQL Disaster Recovery in Kubernetes
onlineRunning PostgreSQL in Kubernetes became a common theme with the rise of the operators. In this talk, we will show you how easy it is to deploy a highly available PostgreSQL cluster in Kubernetes with Operator. We will also show how to prepare for the failure of a data center by running a hot-standby cluster on a DR site.
Slides
Video
PostgreSQL on new Xeon CPUs and Optane Persistent Memory
Intel® Xeon® Scalable Gen 3 - new commands have been added to speed up the database: vector bit manipulation instructions to enable data compression without losses, vector instructions to make TLS protocols faster and SGX enclaves for secure code execution. We will also discuss the new generation of persistent memory Intel® Optane™ PMem 200 series. We will explain what these new technologies including oneAPI tools can give to the PostgreSQL project community
Slides
Video
PostgreSQL for Python Software Developers
I will not teach you database design or query optimization. This talk will cover Python drivers, PostgreSQL usage in Python, and various tools from sync and async world enabling such usage, and how this ecosystem evolved. We'll discuss Python libraries including aiopg, psycopg2, asyncpg, pgbouncer and how they interact with Postgres.
Slides
Video
Migration from Oracle PL/SQL to Postgres pl/pgSQL: Two Years Later
In 2019 we migrated distributed railway application from Oracle 11g SE to vanilla PostgreSQL 11.9. Almost 2 years have passed, the system is working good. The report focuses on how we migrated, what problems we faced during this migration and after migration, as well as what we would have done differently today experience.
Slides
Video
Why Crew Resource Management in Database Teams
Crew Resource Management is a field which has become critical in many fields including aviation and nuclear power. However in IT, this field has not been well implemented. In this talk we will discuss what crew resource management is, what kinds of problems it solves, and why all database teams should implement it.
Several PostgreSQL-related case studies will be discussed and both technical and human factors will be part of the case studies.
Slides
Video
Reliable implementation of complex business logic with pgpro_scheduler
The pgpro_scheduler extension has an interesting but little-known feature - one-time jobs. Despite its simplicity, this feature can be used for complex transaction processing. On the one hand, it helps to reliably execute tasks taking a very long time, on the other hand, it ensures app scaling if certain conditions are met.