Recovery use cases for Logical Replication in PostgreSQL 10

Konstantin Evteev

Avito

Head of Database Development Team

Mikhail Tyurin

Independent entrepreneur in the field of data technology and predictive analytics

Independent entrepreneur in the field of data technology and predictive analytics; former Chief System Architect of Avito.

Avito is the biggest classified site of Russia, and the third largest classified site in the world (after Craigslist of USA and 58.com of China). In Avito, ads are stored in PostgreSQL databases. At the same time, for many years already the logical replication is actively used. With its help, the following issues are successfully solved: the growth of data volume and growth of number of requests to it, the scaling and distribution of the load, the delivery of data to the DWH and the search subsystems, inter-base and internetwork data synchronization etc. But nothing happens "for free" - at the output we have a complex distributed system. Hardware failures can happen - it is natural - you need to be always ready for it. There is plenty of samples of logical replication configuration and lots of success stories about using it. But with all this documentation there is nothing about samples of the recovery after crashes and data corruptions, moreover there are no ready-made tools for it. Over the years of constantly using PgQ replication, we have gained extensive experience, rethought a lot, implemented our own add-ins and extensions to restore and synchronize data after crashes in distributed data processing systems. In this report, we would like to show how our experience can be shifted to a new logical replication subsystem in 10th version of PostgreSQL. In the current implementation, these are only non-trivial solutions - there is a number of issues for the community, that come down to implementing simple recovery mechanisms - as simple as configuring the replication in 10th version.