Talks and tutorials
There's a common delusion that constraints should never be used as they affect performance in a negative way, interfere with regular work, and are, all in all, useless. The database is commonly perceived as just a storage without any logic. I'll explain why it isn't so and what this careless approach may lead to.
The report will talk about a new tool for migrating application systems from the Oracle environment to the Postgres environment. The tool is developed on the basis of the ora2pg tool (by Gill Darold) and the domestic LUI application development tool. Talks on LUI were given at past PGConfs in 2019 and 2020:
Noisia is the result of my past attempts to gather all the tools for reproducing various incidents with Postgres.
Noisia is an utility for easy creation of artificial incidents in a Postgres database. This talk will describe the reason for artifical incidents creation, what Noisia does and what is it's possible use. Also I will share the future development roadmap.
Postgres is a well-known database for high load web applications. Such apps require stress/load testing itself to run properly in production. Besides obvious difficulties in preparation a test environment identical to production, generating proper traffic there is another one issue - database preparation for the test environment. And it seems it is not good to use the database from production in the testing environment in the era of personal data protection (GDPR, HIPAA). Data obfuscation is the rescue.
There are few instruments for data obfuscation in Postgres. During this session, I will tell you which of them we've selected and why what type of issues we faced, and if our solution was successful. You will know if it is possible to get an identical response on the test database without real data from production, we will observe some restriction on obfuscation, I'll present our utility which simplify things.
Artificial intelligence, machine learning, and deep learning are intertwined capabilities that attempt to solve problems that defy traditional computational solutions — problems include fraud detection, voice recognition, and search result recommendations. While they defy simple computation, they are computationally expensive, involving computation of perhaps millions of probabilities and weights. While these computations can be done outside of the database, there are specific advantages of doing machine learning inside the database, close to where the data is stored. This presentation explains how to do machine learning inside the Postgres database.
Fast text search queries can be made even faster with indexing on lexemes inside compound records of tsvector format. RUM index is an open-source PostgreSQL extension. It represents a big improvement of GIN index and it can index lexemes with additional information e.g. tsvector lexeme weight-mark. So it can support tsvector capabilities more.
Until recently it was needed to recheck results of weight-containing queries by table. My modification (2020) is to make the processing of this kind of queries index-only and therefore much faster.
Also, I will describe and provide benchmarks for different usage cases for fast text search. We'll see how RUM index can improve the performance and compare it with PostgreSQL internal GIN index.
Once at Adjust we faced a problem of impending xid wraparound in a very central database due to a long-stalled autovacuum run. Because we spotted warnings 5 hours before impending disaster we were able to minimize the customer impact. Come learn how we used the time to prepare, and what lessons this has for others facing unusual problems in large databases.
Ansible is an open-source configuration management and deployment tool, which can be used to manage servers and software installations. This talk will briefly cover Ansible itself, and then explain how Ansible is used to install and configure PostgreSQL on a server. Examples will round up the talk.
It's a well-known fact, that although the database performance is great, and each query is executed in milliseconds, the overall application response time may be slow, making the users wait for a response for an extended period of time. We know that the problem is not the database, but the way the application developers communicate with the database. Specifically, we are talking about ORMs - Object-Relational Mappers. Database developers hate them, but application developers love them because they allow developing applications without any knowledge of database internals. As a result, the system performance is often unacceptably slow.
The only way to change this behavior is to provide application developers with a tool, which is as easy to use, as an ORM, but which will allow escaping the common ORM pitfalls. That's why we developed NORM - No-ORM Framework. During this presentation, we will go over examples of code from https://github.com/hettie-d/NORM repo and learn how to build "transport objects" for efficient data transfer between applications and databases
JSONB is a popular data type in Postgres, and there is demand from users to improve its performance. In particular, we want to optimize a typical pattern of using jsonb as a storage for relatively short metadata and big blobs, which is currently highly inefficient. We will discuss several approaches to jsonb improvement and present the results of our experiments.
Postgres Pro Multimaster is Postgres extension (and a set of core patches) providing high availability (HA) with strong consistency and read scalability. It forms symmetric shared-nothing cluster synchronously replicating the data and automatically performing disaster recovery. During the last year we've put significant efforts ensuring and proving that consistency is preserved in all scenarios. The new version, which will be released as part of Postgres Pro Enterprise 13 uses Paxos algorithm for determining transaction outcome and custom protocol governing the recovery process; we used TLA+ model checker to verify its correctness. I'll tell how these things work and why in some cases multimaster may be an attractive alternative to the traditional streaming replication based HA deployments.
multimaster is now open source, available at https://github.com/postgrespro/mmts
To make the talk less narrow specialized and more appealing to the wide audience, in the first part I will shed some light on how generally modern DBMSs (mostly so-called NewSQL) handle fault tolerance. In particular,
- what is a strongly consistent DBMS and the associated overhead;
- what is distributed consensus, Paxos, Raft;
- how they help here;
I won't do an attempt to explain any algorithms line-by-line; it would be hardly useful given the time frames and there is a lot of literature available anyway. The goal is rather to waymark the field and get you a bit comfortable there.
An architectural approach as the basis for a sustainable solution. Old and new technologies - unity and struggle of opposites. Information security and business requirements - is there a compromise?
This talk is about postgres extension pg_profile - simple historic database workload profiler. I'll describe it's architecture, features and use cases. There is a new branch of pg_profile called pgpro_pwr, designed to run in PostgresPro Enterprise Edition and PostgresPro Standard Edition databases. It is using extended performance statistics of those databases providing some valuable benefits.
The amount of data that is handled today by Enterprises and Web companies is constantly growing. At the same time, it becomes increasingly difficult to have and synchronize several copies of data in different systems. As a result there is a demand to work with large amounts of data directly in a transactional DBMS. This requirement is often imposed by the logic of applications that need real-time results. In this talk we will consider what a universal distributed transactional DBMS can be. We will analyze such aspects as the types of load and their prioritization, dynamic resource allocation and the level of consistency. What tools in PostgreSQL can be used to build such system, what we have already done and what is still missing.
Okko is one of the largest legal online cinemas in Russia. Our catalog contains 60 000 movies, cartoons and TV series. Since its starting, the service has been visited by over 20 million users. The monthly audience is 2.8 million people. All these figures speak of a reliable high-load service.
As DBA, I will mainly talk about databases (PostgreSQL, Cassandra, Redis) that are used in the company. We'll take a closer look at PostgreSQL on the topics of high loads, monitoring, optimization, backup and recovery.
Kubernetes is an open source container orchestration platform for automating deployment, scaling and management of application containers. Nowadays, more and more applications are being deployed in containers on Kubernetes.
There are several solutions that can help us to run a PostgreSQL cluster on Kubernetes. However, these solutions don't provide query load balancing capability. In this talk, I will show you how to combine PostgreSQL Operator with Pgpool-II to deploy a PostgreSQL cluster with query load balancing capability on Kubernetes.
Monitoring is a very important part in production environments. Although Kubernetes provides a basic way to monitor the status of a PostgreSQL cluster, this is not sufficient for managing a PostgreSQL cluster in production. An important improvement of Pgpool-II 4.2 release is the ability to output more useful statistics of the PostgreSQL cluster. In this talk, I will describe how to monitor and visualize the PostgreSQL cluster statistics in Prometheus for extensive cluster monitoring.
In this presentation, we'll discuss Intel products and solutions intended for the Data Platform Group segment, such as Xeon 3rd Gen (4S Cooper Lake) server CPUs, PMEm 200 Series RAM and FPGA.
Backups are a critical part of any enterprise database solution but they are often done poorly or skipped altogether, which can lead to data loss in the event of hardware failure or some other disaster.
In this talk we'll cover database backup best practices and how to implement them with pgBackRest, including:
- WAL archiving and retention
- Backup frequency and retention
- How to meet recovery time/point objectives
- Configuration options
- Performance considerations
Our company's educational projects are intended to facilitate learning of PostgreSQL. Last year, we focused on courses for application developers: we updated DEV1 basic course and released a brand new DEV2 course. What has changed in training materials representation, how we see the further development of the courses and what else do we have besides the courses, whether the courses for DBAs will be updated and how this will affect the certification and that's what I will talk about.
MobilityDB is a moving object database extension to PostgreSQL and PostGIS. It has types and functions for storing an querying geospatial trajectories, as first class citizens. The main type is called tgeompoint (temporal geometry point). It represents a complete movement track of a geometry point, such as a car, a bird, or a person. The function speed(tgeompoint) computes the time varying speed of the object, as a tfloat (temporal float). Similar to these examples, MobilityDB has 6 temporal types, and over 300 functions. As such, it is a function-rich platform for Mobility Data Management.
In this tutorial you will:
- learn about moving object databases
- write MobilityDB SQL queries and explore a database of geospatial trajectories
- walk through the different type, indexes, and functions of MobilityDB.
Let's see, with practical examples, how to make Python and PostgreSQL talk to each other seamlessly: how to connect to a server, how to exchange data, manage transactions, passing parameters in a safe and expressive way, how to manage notifications.
We will cover psycopg2, the most used PostgreSQL adapter for Python, but also look and the up and coming psycopg3: what will remain the same, what will change, how to better organise a Python program to make the most of PostgreSQL.
PostGIS is a Spatial Extension to PostgreSQL database. This talk will look like reading the changelog aloud with some comments and real-world usage examples from the core developer.
Postgres is able to build optimal query plans for most practical cases. However, sometimes, for objective reasons, for complex queries or because of open issues in the planner itself, it can make mistakes and produce a suboptimal plan. Because of this, the execution time of such a request can increase tenfold. If the query is executed frequently, then from time to time this query takes longer than it could, and the DBMS as a whole produces a lower TPS. If the planner is able to record his mistakes and take them into account in the subsequent planning of the same query, then this will improve the characteristics of the DBMS during its operation. We present the results of the development of a PostgreSQL DBMS extension that stores the query execution history and implements the planner recommendation mechanism. We show how knowledge about previously executed queries can improve the performance of subsequent ones.
PostgreSQL relies on the system collation libraries, such glibc or ICU, for text ordering. One know caveat is that when the library change its sort order for a collation, any index created using the old order is likely to be corrupted when the new version of the library is installed.
In this talk, we'll see the improvements done in PostgreSQL 14 to keep track of the collation versions, detect and fix possible index corruption due to library upgrades and the work currently being done to further improve this area.
PostgreSQL has a long history in DWH. Netezza, Redshift, and Greenplum have turned specific PostgreSQL releases into DWH solutions. Nowadays, with the trends in PostgreSQL towards performance improvements (better partitioning, better statistics, JIT Compilation, etc.) and advanced PostgreSQL extensions, like the Swarm64 Data Accelerator, you can create a forward-looking, no lock-in, versatile, and reliable DWH. This talk will cover the PostgreSQL and DWH trends and touch on key arguments for choosing open source PostgreSQL for DWH.
Multicorn technology allows you to develop FDWs in Python, which is much easier and faster than creating FDWs in C. However, there is a downside, Multicorn FDWs work well with primitive WHERE conditions, but more complex cases cause difficulties, which I will talk about. Cases will be considered on the example of my Multicorn FDW for getting OpenStreetMap data. I will also show examples of using the same code in Multicorn FDW and plpython functions, including performance comparison. In conclusion, I will share my findings on when it is better to use plpython, and when Multicorn FDW is more preferable.
In this talk, you'll learn about Postgres Pro DBMS from the co-founder of Postgres Professional. Ivan will explain the philosophy behind this enhanced variant of PostgreSQL, reveal the differences between Postgres and Postgres Pro and provide the roadmap for the further evolution of Postgres Pro DBMS.
Cloning and transferring PostgreSQL databases & clusters often looks simple.
However, you can get confused while performing these simple operations, too. During my presentation, I will explain which pitfalls you may face while cloning and transferring PostgreSQL databases & clusters. We'll see what can be done to improve the performance of these operations and list the unexpected issues that arise while performing these seemingly simple operations.
High availability solutions have become extremely popular in the past few years. They play a critical role in building reliable systems based on affordable hardware. In this presentation, we will pay attention to some of the subtle aspects of the design and maintenance of such systems. In addition, the issues of capturing changes on a HA cluster will be addressed.
Recently landed in PostgreSQL, jsonb subscripting support doesn't look as exciting as some other improvements around jsonb. But it's user visible changes are only tip of the iceberg. How many people were involved to make it, and what decisions choices were made? How long did it take, and what are the good/bad ideas to promote a patch? These and few other questions will be our targets in this talk.
PostgreSQL databases can become corrupted for a variety of reasons, including hardware failure, software failure, and user error. In this talk, I’ll talk about some of my experiences with database corruption. In particular, I’ll mention some of the things which seem to be common causes of database corruption, such as procedural errors taking or restoring backups; some of the ways that database corruption most often manifests when it does occur, such as errors indicating inconsistencies between a table and its indexes or a table and its toast table; and a little bit about techniques that I have seen used to repair databases or recover from corruption, including some experiences with pg_resetxlog. This talk will be based mostly on my experiences working with EnterpriseDB customers; I hope that it will be useful to hackers from the point of view of thinking about possible improvements to PostgreSQL, and to end users from the point of view of helping them avoid, diagnose, and cope with corruption.
PostgreSQL is one of the most standards-compliant databases available. Talk discusses the impact that PostgreSQL has made on the SQL Standard and the features PostgreSQL has implemented from the SQL Standard, as well as upcoming features in PG14 and beyond.
Almost every business app is essentially just a SQL generator. How to easily build and compose SQL queries? I will explain the "Clojure way" of representing SQL as data (data DSL) and show how it may help you to dynamically build and compose SQL queries up to macros and query analysis.
In today's world, operational reporting and real-time analytics are becoming a basic need. There are a huge number of tools, practices, and approaches, which in turn require different expertise and resources. In this talk, I will tell you how to transform your analytics using PostgreSQL. We'll discuss pitfalls when using different schemes. We'll talk about data quality and performance. I expect this talk to be of interest to both beginners and seasoned practitioners with many years of experience (Discussions and questions after the talk will be highly appreciated). The talk outline is as follows. 1. The evolution of reporting - migration from OLTP to OLAP. 2. Data delivery to DWH and related challenges. 3. Scaling architecture in response to growing data volumes. 4. Data quality issues. 5. Maintaining stability with a large number of changes. 6. Different approaches to organizing the work of the DWH team. 7. And, finally, we'll list the challenges we have successfully responded to with various solutions (pgAgent, PGWatch, working with the file system, the new reading of postgresql.conf).
Python is today one of the most used programming languages in the world: simple to learn and to use and ready to interface to any known service and protocol. psycopg2 is the most used PostgreSQL driver for Python: it offers good performance and makes the communication between the language and the database as smooth as possible.
Python has evolved enormously in the past years and its first-class support for async programming is changing the way new programs are written. PostgreSQL has evolved too: a new generation of the driver is needed to make the most of all the features it has to offer.
psycopg3 is the new generation of the most used Python-PostgreSQL adapter: it offers a familiar interface and smooth upgrade path, but behind the scenes it is engineered to obtain the best performance from the database and the language: async programming, prepared statements, binary parameters.
psycopg3 is also experimenting with innovative JSONB support and query pipelining! Come and discover the forefront of the research between your most loved language and database!
This talk introduces GPU version of PostGIS and GiST-Index that we have developed as a new feature of PG-Strom.
Nowadays, our devices (like mobile phones) generate geolocational data time-by-time, and it is often utilized for area-marketing, push-delivery, disaster notification, and so on. People often use GIS technology to pick up users based on their current location. Even if area definitions are complicated polygons, PostGIS functions can generate right intersections, however, it is often highly computing intensive workloads.
GPU is designed for massive parallel computing workloads, with more than thousands cores per chip. And, we have developed PG-Strom extension to run a part of SQL workloads on GPU devices. At the upcoming PG-Strom v3.0, it newly supports several PostGIS functions and GiST-index for the computing intensive geolocational workloads.
In this talk, we will introduce the technology background, usage, implementation and benchmark result of GPU version of PostGIS and GiST-Index.
Native partitioning was introduced in PostgreSQL 10, and every new release comes with more features and optimizations to this area. Yet, there is still room for improvement.
This talk briefly compares the built-in PostgreSQL partitioning with third-party extensions (pg_pathman and pg_partman), to understand what we still don't have in the core. This talk also includes an overview of partitioning-related features that are currently being developed and aimed for PostgreSQL 14.
Materialized view is a feature to store the results of view definition queries in DB in order to achieve faster query response. However, the data in the view gets stale after underlying tables are modified. Therefore, view maintenance is needed to keep the contents up to date. PostgreSQL has REFRESH MATERIALIZED VIEW command for updating a materialized view, but this command needs to recompute the contents from scratch, so this is not efficient in cases where only a small part of a base table is modified.
Incremental View Maintenance (IVM) is a technique to maintain materialized views efficiently, which computes and applies only the incremental changes to the materialized views instead of recomputing. This feature is required for updating materialized views rapidly but not implemented on PostgreSQL yet.
Therefore, we developed IVM on PostgreSQL and are proposing to implement this as a core feature. The patch is now under discussion on the hackers mailing list. Our implementation allows materialized views to be updated automatically and incrementally when a underlying table is modified. You don't need to write your own trigger function for updating views. As a result of continuous development, the current implementation supports some aggregates, subqueries, self-join, outer joins, and CTEs (WITH clauses) in a view definition query. The result of performance evaluation using TPC-H queries shows that our IVM implementation can update a materialized view more than 200 times faster than re-computation by REFRESH command.
In this talk, we will describe our IVM implementation and its features.
When using PostgreSQL in production it is important to implement a strategy for High Availability. With a database service, the HA properties apply to both the service itself and of course to the data set.
In this talk we learn how to think about specific HA needs of your production environment and how to achieve your requirements with Open Source tooling when it comes to your database of choice, PostgreSQL. In particular, we dive in many options that could be implemented for Postgres to evolve its offering from being a toolset to being “batteries included”. What does it mean in the context of HA? How to achieve it?
Some OS distributions do not have a pcs configuration utility to create a high availability cluster PostgreSQL. In this case, the crm utility from the crmsh package will help us. It is more difficult to use, but powerful and effective.
In my master class, I will show how to use this utility, as well as configure a failover cluster in different configurations.
Is deploying Postgres in Kubernetes just repackaging it into a container? Can’t Postgres leverage the wide range of Cloud-Native software and integrate well with K8s? Join this journey that will cover and demonstrate, with demos running on StackGres:
- How to structure Postgres into an init-less container, plus several sidecar containers for connection pooling, backups, agents, etc.
- Defining high level CRDs as the single API to interact with the Postgres operator.
- Using K8s RBAC for user authentication of a web UI management interface.
- Using Prometheus for monitoring; bundling a node, Postgres and PgBouncer exporters together.
- Proxying Postgres traffic through Envoy. Terminate Postgres SSL with an Envoy plugin, that also exports wire protocol metrics to Prometheus.
- Using Fluentbit to capture Postgres logs and forward them to Fluentd, which stores them on a centralized Postgres database.
You will be able to follow the session on your own Kubernetes cluster, and go from zero to a Postgres hero on Kubernetes with little effort! Create in minutes your own Postgres-as-a-Service on your Kubernetes.
PostgreSQL is a mature and robust RDBMS since it has 30 years of history. Over the year, its query optimizer has been enhanced and usually produces good query plans.
However, can it always come up with good query plans? The optimization process has to use some assumptions to produce plans fast enough. Some of those assumptions are relatively easy to check (e.g. statistics are up-to-date), some harder (e.g. correct indexes are created), and some nearly impossible (e.g. making sure that the statistic samples are representative enough even for skewed data repartition). For now, given those various caveats, DBA sometimes can't always realize easily that they miss a chance to get a meaningful performance improvement.
To help DBA to get a truly good query plan, we'll present below some tools that can help to fix some of those problems by providing a missing index adviser, looking for extended statistics to create, and row estimation error correction information to get appropriate join orders with join methods automatically.
- pg_qualstats: provides a new index and extended statistics suggestions to gather many predicate statistics on the production workload.
- pg_plan_advsr: provides alternative good query plans automatically to analyze iterative query executions information to fix estimation rows error.
In this talk, we will explain how those tools work under the hood and see what can be done, how they can work together. Also, we will mention what other tools also exist for related problems. Therefore, it will be useful for DBA who are interested in improving query performance or want to check whether current settings of indexes and statistics are adequate.
pspg is unix pager specially developed for usage in psql Postgres client. Today it allows more than usual browsing data. It can work in application mode or it work as csv or tsv viewer too. I will try to show the main possibilities of this application.
In this talk we want to present how Microsoft team composed of people from two different teams approached the project and solved the migration issues using ora2pg and was able to prove that Postgres Single Server can perform equally well as Oracle Exadata. We will present our ways of working and also some main technical challenges that we faced including migration of BULK COLLECT’s, hierarchical queries, refcursors and others more complicated Oracle constructs.
The story about a challenging PoC that proved that Postgres can achieve the same performance as Oracle Exadata. The schema that was migrated wasn’t the simplest one you might see. It was quite the opposite. The code was loaded with dynamic queries, BULK COLLECT’s, nested loops, CONNECT BY statements, global variables and lot of dependencies. Ora2pg did a great job converting the schema but left a lot of work to do manually. Also estimates produced by the tool were highly inaccurate since the logic required not the migration but total re-architecture of the code. In this talk we want to present how Microsoft team composed of people from two different teams approached the project and solved the migration issues using ora2pg and was able to prove that Postgres Single Server can perform equally well as Oracle Exadata. We will present our ways of working and also some main technical challenges that we faced including:
- How estimates do (not) work
- How we handled BULK COLLECT’s
- Why we got rid of refcursors
- How we got stuck with testing of one the packages and how the help from a friend solved the problem
- How we handled hierarchical queries and drilling down the hierarchy
In Zabbix Server 5.0.1 the PostgreSQL monitoring plugin has become available for Zabbix Agent 2. It was developed by Postgres Professional in collaboration with Zabbix. In the presentation, we will talk about how the plugin works, discuss some options for its configuration, as well as how to add custom metrics to it.
What do we expect from a backup system? What makes a good backup system stand out? And, most importantly, how do you select the right tools for this process? When preparing backups, many pressing questions arise. In my talk, I will tell you a story of building a system for taking and verifying backups in one company. We will discuss the issues of choosing a tool for working with backups, adapting to changing realities, the problems of cloud storage, and the limitless possibilities of open-source collaboration. This talk will comprise the experience I gained in two years. You'll take the same journey, but much faster. Welcome aboard!
PostgreSQL has become a reality in a lot of shops today. In most cases PostgreSQL is established beside the current Oracle deployment and quite soon one question pops up regularily: How can we push data from Oracle to PostgreSQL and vice versa? Way back, in March 2001, a new extension to the SQL standard made it's way to define common APIs for managing external data: SQL/MED (ISO/IEC 9075-9:2008). The PostgreSQL community picked that up quite fast and implemented a framework for plugging in so called foreign data wrappers back in 2011 with PostgreSQL 9.1. Since then a wide range of these foreign data wrappers popped up and thanks to those, PostgreSQL today is able to integrate data from almost every external source, no matter if it is coming from flat files, other relational database systems or even unstructured sources. In this talk we will look at the foreign data wrapper for Oracle and how it can be used to get data from Oracle to PostgreSQL. But this is not a one way game: data can also be pushed from PostgreSQL to Oracle, and this might become important depending on the requirements. It is guaranteed that this talk is splitted by half: Slides and a lot of demos.
PostgreSQL provides different levels of security. This talk will cover all the available security techniques used in PostgreSQL 13. We’ll look at client-side security (LibPq, JDBC) through to server-side security. It will cover all supported authentication methods and the pros and cons of all these methods. Some of the key features of the talk are:
- Introduction to Cryptography
- SSL, TLS, GSSAPI, and OpenSSL
- Client-Side Encryption
- Securing Authentication
- Securing Data on the disk
- Securing Backup & Basebackup
- Securing Replication
- Database Roles and Privileges
It’s important to be familiar with all the security levels such as (1)network-level security (2) on-disk security (3) row-level, (4), and column level security. The talk will cover all the aspects with some real-life use cases and examples.
When it comes to performing postgres upgrades between major versions the command line utility pg_upgrade is the most popular method used today. But as with all things wonderful, there are caveats. One of the more critical issues is what to do when there's a failure. The purpose of this talk is to present those little covered "tricks" of the trade that allows one to improve and enhance the upgrade experience.
We'll start out by reviewing its basic mode operation. We'll then learn what makes it so darn fast when upgrading multi-terabyte datacluster in a matter of minutes. Finally, you will be introduced to those feared failures and how to deal with them with confidence and certainty.
Herein is a summary of the topics in this talk:
- How pg_upgrade works: the big picture
- About pgupgrade (comand line invocation)
- arguments and options
- The steps performing an upgrade
- About the replication ROLE
- with replication
- with login
- Upgrade options: COPY vs Hard Link
- about performance
- When something goes wrong at the point of no return (or is it?)
- Upgrading the REPLICA
- default method: pg_basebackup
- the cool method:
caveat; about vacuum wraparound
- - leveraging rsync
Logical replication has been there since 10.0 and with each release, it is getting better. This talk will start with the basic architecture of Logical replication in PostgreSQL and then cover the various ways in which it can be helpful to users.
One of the shortcomings of logical replication as compare to physical replication is that currently, it allows the transaction to be replicated only once it is committed. This can create a large apply lag on the subscriber side for long-running transactions. We will discuss the solution implemented for this problem for PostgreSQL.
We will also discuss the other major work being done in logical replication which is to allow the streaming of transactions at the prepare time. This will help us in implementing conflict-free logical replication. This can be used for scaling-out reads as well. Because of 2PC, we can ensure that on subscribers we have all the data committed on the master. Now, we can design a system where different nodes are owners of some set of tables and we can always get the data of those tables reliably from those nodes, and then one can have some external process that will route the reads accordingly.
In the end, this will cover the new enhancements, improvements related to Logical Replication in recent PostgreSQL releases.
Postgres Pro Enterprise has a great compression engine. The year 2020 was devoted to the study of this mechanism in the real work of 1C. We have accumulated some statistical data and of course the subtleties of the use and behavior of 1C compared to other popular DBMS, which I want to share.
How do you monitor Postgres? What information can you get out of it, and to what degree does this information help to troubleshoot operational issues? What if you want/need to log all the queries? That may bring heavy trafficked databases down.
At OnGres we’re obsessed with improving PostgreSQL’s observability. So we worked together with Tetrate folks on an Envoy’s Network Filter extension for PostgreSQL, to provide and extend observability of the traffic inout a cluster infrastructure. This extension is public and open source. You can use it anywhere you use Envoy. It allows you to capture automated metrics and to debug network traffic. This talk will be a technical deep-dive into PostgreSQL’s protocol decoding, Envoy proxy filters and will cover all the capabilities of the tool and its usage and deployment in any environment.
In a heavily-loaded project, any change implies non-zero risks of downtime or performance degradation. We constantly see how system complexity, number of database nodes, deployments per week constantly grow. How grows the level of automatization of various activities in CI/CD pipelines, containers, Kubernetes.
Meanwhile, when we look at the topic of testing of database changes–from trivial index creation to complex, almost "surgical" operations like converting int4 PK to int8 one in a multi-terabyte database under load–here we observe an obvious lag in technology and methodology development. In the best case, the changes are verified and approved visually, and here it all depends on the level of experience and tiredness of the reviewer.
In this talk, we will look at how we (Postgres.ai) solve this problem using our solution, Database Lab:
- instant provisioning of independent thin clones of multi-terabyte databases, ready to be used for testing,
- integration with existing CI/CD solutions and workflow,
- collection of metrics that are the most useful for decision support on whether or not each database migration has to be approved (and even fully automated rejection of the most dangerous actions).
CTO Postgres Professional Teodor Sigaev and 1C ERP platform expert Anton Doroshkevich will discuss the existing maintenance issues for 1C ERP on Postgres and their potential solutions.
PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. The default configuration is designed in such a way that PostgreSQL can run on any system using minimum resources. Consequently, a default installation of PostgreSQL does not give optimum performance on the high-performance machine because it is set up to use all available resources. PostgreSQL provides mechanisms that allow you to tune your database according to your workload and machine specification. Outside of PostgreSQL, though, we can tune the Linux kernel to allow the database load to work optimally. In this talk, we will learn how to tune some of the PostgreSQL’s parameters, and we will see the effect of that tuning, but we will focus on demonstrating how to tune Linux for better Postgres performance. As there are so many Linux kernel parameters that can be tuned to improve the performance of PostgreSQL, I will also share the results of benchmarks obtained when tuning some of the Linux parameters.
There are two types of SQL query analysis:
"Macro": analyzing the workload as a whole (three major approaches: using metrics provided by pg_stat_statements or similar, log analysis with pgBadger or similar, and sampling of pg_stat_activity)
"Micro": diving into details of single query execution (EXPLAIN command being the central tool here)
And there are huge gaps between them that become noticeable at scale. The main challenges:
- Switching between "macro" and "micro" without a huge overhead
- Verifying optimization ideas reliably
- Deploying changes risk-free
Solving these tasks at a scale requires advanced DBA experience and–sometimes–intuition. Or better tools that (fortunately!) very recently started to appear.
In this tutorial, we will learn how to establish a smooth and seamless SQL optimization process in your organization: * what tools should you choose in your particular case? * how to close the gaps mentioned above?