Talks and tutorials
Tasks for migrations from the classic commercial DBMSs to open-source solutions are still relevant nowadays. Tools used to migrate Oracle applications to PostgreSQL has already proved to be efficient. But what if you have SQL Server? We want to introduce you to Babelfish, a OSS solution based on PostgreSQL 13. Babelfish supports TDS network protocol, T-SQL language, and SQL extensions specific to SQL Server. However, it isn't that simple. In our talk, we'll cover the features ensuring compatibility with SQL Server, existing problems and ways to solve them. Bonus: we'll also share the story on how we added the Babelfish support in DBeaver using the JDBC driver from Microsoft.
The volume of data stored in Postgres Pro databases is growing rapidly. In some organizations, such databases have already exceeded tens of terabytes in size. The existing approaches to backups don't ensure timely backup and restore of data in such cases. We have supplemented the well-known and well-tested Postgres Pro backup tools with functionality that will allow users to backup and restore large databases. In addition to that, we have created our solution with the convenience of DBAs in mind.
Usage of temporary tables in PostgreSQL causes additional issues like high server resources consumption and low query performance. However, sometimes you need to rely on them especially when you need to migrate your code from MS SQL, and your initial code employed them, and you had some logic implemented at the database level. This talk covers issues related to usage of temporary tables when migrating from MS SQL, and the ways to resolve them using built-in PostgreSQL features based on the scenarios contained in the code.
SPQR is a lightweight OLTP sharding solution written in Go. In this talk, I'm going to discuss design decisions that led us through FDW-based sharding, CustomNode-based sharding, C-implemented lightweight query routing, and finally to SPQR design.
Previously, we have explained the internals of Avito, discussed, where and how we store your classifieds, and how they appear on the search results. In the recent 3-4 a lot has changed in Avito. We got rid of logical replication, stopped using standby servers for reads, removed nearly all stored procedures and our custom failover solution, migrated all our classifieds into a sharded DBMS, switched from the monolith app to microservices. I'll explain why we made such decisions, list some of the problems we encountered and describe the current state of our development process.
A modern database should be capable of storing big values. Storing itself is not a big deal, however, operations with big values or fields are a non-trivial task. PostgreSQL has several options for storing big values, but none of them is perfect. How do we respond to this challenge? Our presentation answers this question, let's see how to store big and complex values in Postgres properly, and how operate with them.
A book. Typically a reader sees only the final product, printed on paper or opened on a computer screen. In this talk I invite you to take a look at the dark side of my recently published “PostgreSQL Internals” book. Come if you are curious why the author would want to write scripts, modify the PostgreSQL source code, and program pictures.
The release cycle for version 15 is nearing completion. At the time of the presentation, the complete list of changes will not be known. But for sure there will be something to talk about.
Our talk reviews the modern tools for administering PostgreSQL. It also covers the DBMS's performance bottlenecks and provides approaches to resolving the related issues. The presentation tackles the problems that occur while Oracle to Postgres migrations including the following: performance tuning tools (the analogues of AWR and ASH); monitoring tools (the analogues of Cloud Control); ensuring high availability and reliability of the database (grid infrastructure analogues); known "database performance bottlenecks"; an overview of options for technical support options SLAs.
Many books end with a wedding, but the reader has no idea about the future life of the heroes except that they lived happily ever after. In 2019, we successfully migrated distributed our railway application from Oracle 11g SE to vanilla PostgreSQL 11.9. But our story did not end with this successful migration - life went on, and sometimes we got startled because of "surprises". We encountered a number of problems, some of which were solved by reorganizing the data, some disappeared after we changed our stored procedures, and some got resolved after tuning the PostgreSQL parameters. Solving our problems would be impossible without the logging and profiling system built into our DB application. Our talk covers the examples of successful detection and resolving of the performance issues that occurred in our PostgreSQL-based application.
As opposed to in-memory DBMSs, traditional disc database management systems have an inherent physical limitation. It's their storage system which is by several orders of magnitude slower than CPU operation memory, even in the case of All-Flash NVM. It becomes especially critical for the development of business analytics apps and OLAP scenarios on disc DBMS. We will consider a number how-tos for improving the performance of traditional DBMSs, and then offer a new storage-centric approach to hardware-based speed-up for disc DBMSs.
In this presentation, we'll talk about the newest SQL/JSON features committed to PostgreSQL 15. We'll explain how to use them, where they can help and why they make a difference. Is PostgreSQL JSON implementation fully compliant with SQL:2016? In the meantime, the next generation of the SQL standard is being developed. What will the new standard include, and how will Postgres respond to it?
PostgreSQL has a number of peculiarities that you need to take into account not only while maintaining your database but also when designing your database schema. Experienced PostgreSQL are well aware of vacuuming process. On the web one can find tons of materials covering its internals, configurations and monitoring. Many valuable talks about vacuum were given at numerous conferences. However, we still face the common wraparound problem when the maximum possible number of transactions (xid) is reached. It happens even on databases that are relatively small in size. In my presentation, I will share a customer case that looks interesting to me. A chain of mistakes made at different stages of the database's life cycle once caused a disaster. The database fully stopped for one week, we detected a wraparound and spotted corrupted blocks. Maintenance was problematical, and we spent sleepless nights in search of a solution. We managed to achieve a local win as we finally restored the database, but it's not the end of this story, which makes it even more interesting.
My talk covers PostgreSQL infrastructure in Ozon: - how do we tune virtual machines (KVM); - which version control system we have created; - which drivers we make and how it relates to fault tolerance and load balancing; - how do we make "hot" upgrades for the parameters of our virtual machines.
How the State Air Traffic Management Corporation of the Russian Federation migrated from Oracle to Postgres ProTalk
This talk covers the Oracle-to-Postgres Pro migration project for the billing system of the Federal State Unitary Enterprise “State Air Traffic Management Corporation of the Russian Federation” employed to collect the air navigation charges for the use of airspace.
Mamonsu is an active monitoring agent for PostgresSQL based on Zabbix. We are actively developing the agent: there are new unique metrics and visualization capabilities. But in addition to metrics collecting mechanism, Mamonsu has other useful features. I am going to briefly describe these features also known as "Mamonsu tools", the agent's advantages and the installation process.
The plv8 extension is very underestimated, though it can help simplify development for a variety of tasks. It's been a year since we started using this extension in our projects. The main reason for its low popularity is the complexity of building. Moreover, building plv8 requires a lot of traffic and disc space for temporary files. That's why we decided to build plv8 and provide ready-made binaries freely to anyone interested, not just as an offering for our customers. First, we made the binaries available on GitHub. Then we created a Docker image for Postgres with pre-installed plv8 based on Debian. It looked too heavy to us, and we decided to check Alpine, however, building it for Alpine appeared to be more complicated than we expected. I'll explain what difficulties we faced and how we managed to overcome them. I will also list the tasks that can be solved with plv8 and provide advice on how to make the development process more convenient. I will also share the links to our Docker images for the latest Postgres 13 and Postgres 14 releases, that are compact and can be used for any types of tasks.
During my career, I've seen a lot of code, and very often I faced inaccuracies in handling dates and times. Sometimes the parties got discrepancies in the monthly reports due to such inaccuracies; or daily reports were different for Moscow and SFO, etc. I wouldn't call it a serious problem, but it's annoying and time-consuming. In all cases, such issues occur because of neglectful handling of dates and times. In my presentation, I will discuss how we can avoid it.
I am going to explain a sorting method added in Postgres 14 that allows to create GiST index much faster. We'll also talk about the disadvantages of this sorting method discovered during implementation of its support in PostGIS and how it will be improved in the future. Also, new features and improvements included in PostGIS 3.2 will be reviewed.
Every DBMS stores its data on a hard disc, so you may face a situation when your data on disc gets corrupted. This can happen due to a controller failure, logical or physical data corruption; there are also other reasons. If it is just the index file that gets corrupted, the index recreation command will enable you to restore data consistency in your DBMS. If a table file or a file of a system section gets corrupted, data restore is impossible. In this case, you need to invent workarounds. You can try to restore this data from backup files, copy it from the corrupted table, or find another way to solve this problem. In this talk, we'll tackle several cases of data corruption on disc and describe the options for restoring the data from the corrupted tables.
Disaster-tolerant cluster implies minimal data loss in the event of the main data center disaster and switching to the backup data center. The data loss takes place due to the asynchronous replication between the main data center and the backup one. However, there's a solution that can resolve the existing problem and ensure zero data loss in the case of the main data center disaster. My talk covers this solution for zero data loss.
In June 2020, AlfaStrakhovaniye was obliged to use RSA 2.0 government services along with other insurance companies. It was the only way to sell car insurance in Russia. AlfaStrakhovaniye is a leader in car insurance, and its applications degraded under the pressure of calls. It wasn't planned. Postgres used a huge amount of CPU and memory. Nobody could tell what was going on. Degradation started and stopped for no reason. Dev Team and Postgres Pro experts solved that problem. MyBatis in Java services was the killer. Our story is about the development of applications for heavy workloads and technical ignorance while using frameworks.
Replication is one of the important mechanisms designed to provide database fault tolerance and scalability. In our practice, we regularly encounter the problem of low replication performance. This prompted us to investigate the factors that affect the speed of physical replication. In this presentation, I will talk about our findings. I will also demonstrate the differences in replication performance for various versions of PostgreSQL.
The more tables are involved with the query, the more difficult it is for the scheduler to choose a suitable execution plan (both time and memory usage increase). How can we "tell" the planner that it is better to connect this pair of tables first, and the rest can be connected later? What if we see that a part of our query can be improved, but the optimizer does not do this. In my presentation, I will talk about managing the order of connections. I will explain how we can influence the formation of a query plan using the standard "vanilla" PostgreSQL methods.
Logical decoding and replication slots introduced in PostgreSQL 9.4 (released in 2014) created a solid foundation for implementing built-in core logical replication in version 10 (released in 2017). Unfortunately, there are a few limitations that make logical replication not very useful in real-world scenarios. Logical decoding currently isn’t supported on the standby server, and PostgreSQL allows creating logical replication slots only on the primary server. Or in other words, logical slots are lost on failover/switchover.
Postgres hackers made many attempts to address the problem, and most of them resulted in not too much success. Although, there is one little function introduced in PostgreSQL 11 that made it possible to implement failover of logical replication slots externally.
In my talk I will tell a story of how Patroni solves the problem of logical replication slots failover without using invasive third-party extensions, dig down into some of the Postgres internals in order to prove why this approach is safe, and finally, we will discuss limitations and potential downsides of this solution.
Experience in using Live Universal Interface (LUI) and PostgreSQL in creating an analytical reporting systemTalk
The talk covers the use of PostgreSQL, LUI and LUI4ORA2PG for building an analytical reporting system.
The talk tackles the following topics:
- migration from an Oracle environment;
- application of JSON functions;
- how the temporary tables helped us;
- our own means for load testing and bottlenecks detection;
- how to make beautiful GeoJSON format maps to display diagrams on them;
- installation and testing of the system on an "Elbrus" computer;
- what became an obstacle or was missing when using PostgreSQL.
The history of growth of the Live Universal Interface (LUI) web application development tool and the LUI4ORA2PG migration tool can be found in our previous presentations at PGConf conferences:
A cloud provider operates with lots of hardware resources and network entities. It's one thing when a company runs dozens of Postgres Pro instances in an isolated loop for internal tasks. And it’s quite a different story when a cloud provider renders services to B2B users and maintains solutions for 500+ large companies.
Vitaliy will tell how the VK Cloud Solutions cloud provider currently employs Managed Postgres Pro. He will also demonstrate the solution's architecture and cover the challenges faced while creating the Postgres Pro solution in our cloud.
Implementation of ETL is one of the first tasks that any PostgreSQL user needs to solve. There is a misconception that low-code tools implementing ETL procedures reduce requirements for the user, but due to sacrificing performance and/or flexibility to this goal. The presentation will demonstrate that it is possible to design ETL procedures quickly and easily without losing productivity. We will explain how high speed is ensured, how PostgreSQL features are used and how the combination of PostgreSQL and low-code platforms allows you to get rid of one of the most frequent user pains.
The myths and misconceptions of software import substitution. The acute threat of non-working DBMS servers. The basic plans for import substitution and problems of the transition period. DBMS data protection and the image of an ideal protection system. Crypto BD is the cryptographic data protection system. How does it work?
About 7 years ago, I moved from Windows-only development lead by an irresistible desire to add some parts to PostgreSQL. In this talk, I would like to cover things that were not obvious to me when I started working with the source code, the PG build and testing system. I'm going to talk about the simplest issues - routine IDE tasks, navigation, build and release process, and similar stuff. What I will share isn't the one and only truth. Some topics might relate to dummy-level problems :) I would be glad if other developers also shared the secrets of their development life. We can arrange a discussion of grep vs IDE :)