Talks and tutorials
pg_repack is one of the most popular instruments for removing bloat of tables and indexes in Postgres. In most cases, it works perfectly. But if you use such a feature of Postgres as deferred constraints, using pg_repack becomes more difficult or even impossible. I will talk about how we encountered the problem and will describe some workarounds - from internal instruments of Postgres to a small patch for pg_repack.
Open-source maintainers face many challenges as projects grow. How to write more required features, fix more issues and have time to watch more pull requests? On the example of WAL-G(backup tool for PostgreSQL) I will tell you about how we solved these problems by launching a course of Open-source development at Ural Federal University, what we achieved and what will we do next.
While creating the infrastructure for a large blockchain ecosystem, we encountered PostgreSQL performance problems when inserting a large amount of data. So we decided to migrate the data to Clickhouse. It brought us some happy months, until the first users of the system finally came in and it was time to give the data back. It forced us to migrate back to PostgreSQL, this time in a much better thought-through storage scheme. In a pursue to get the most out of each technology, we learned a lot of interesting quirks about Clickhouse and PostgreSQL and their relevant optimization techniques. We found PostgreSQL to be much more flexible for a broader range of load profiles.
Last year I made a talk about unexpected PostgreSQL bottlenecks, which could make sad surprise to user (or DBA). Feedback to my talk was very positive. Additionally I have new material after year. This is why I'm making a sequel including new unexpected situations when your database hangs. This time focus will be on multicore hardware platforms, but not only them.
In my talk, I will tell how we built a geographically distributed system of personal data storage based on Open Source software and PostgreSQL. The concept of the inCountry business is to provide customers with a ready-to-use infrastructure for personal data storage. Our business customers are ensured that their customer’s personal data is securely stored within their country’s borders. We wrote an API and SDK and built a variety of services. Our system complies with generally accepted security standards (SOC Type 1, Type 2, PCI DSS, etc.). We built our infrastructure with Consul, Nomad, and Vault, used PostgreSQL, ElasticSearch as a storage system, Nginx, Jenkins, Artifactory, other tools to automate management and deployment. We have assembled our development and management teams - DevOps, Security, Monitoring, and DBA. We use both cloud providers and bare-metal servers located in different regions of the world. Development of the system architecture and ensuring the stability of the infrastructure, consistent and secure operation of all its components is the main task facing our teams.
I will speak about our experience in solving the fault-tolerance problem for PostgreSQL, which options we reviewed and how we chose Patroni.
I will cover solution testing, quick implementation to the production and issues we faced as well as the way we solved them.
To build a decent query plan, the optimizer has to understand statistical characteristics of underlying data. It is interesting to observe how the structure of the collected information became more complicated over time: what the optimizer relied on back in its early days and what is at his disposal now with the release of the 12th version. We will also talk about how and when statistics are collected, how to manage this process and whether it is necessary to think about it at all.
Any DBA needs some kind of tool for historical workload analyse. Assume once at morning your monitoring team will report of sudden performance degradation at 2-3 a.m., and now you need to investigate this issue. What activities was most resource consuming within that hour? There are several tools for solving this problem, and I'll talk about one very easy and convenient tool - pg_profile. It need only a postgres database and a cron-like tool to run, and it will generate a workload profile report for your database as you need it. Ths report will be a good start point for further investigation.
I'll show an example of solving the problem of searching "similar" texts for one given text in big array using GiST index. The problem itself is not much important, but it is easy to understand. Using this problem as example, I'll show one of many methods of adapting GiST index for custom search problems. Maybe this talk will help you to solve other search problems.
PostgreSQL 13 Feature Freeze is scheduled for April 2020. Two more commitfests are still accepting new patches. What we can say about PostgreSQL 13. It's possible that rotation rule will work so that new release wouldn't have as many new features as PostgreSQL 12 have. If even it is do, that would be good evolutionary release with a lot of medium feature and infrastructure changes, which prepares postgres for new leap. In this talk I will overview expected novelties in PostgreSQL 13. It would be more or less accurate, since there would be only one commitfest left, which results are possible to forecast.
Moscow public transport vehicles when moving report their coordinates via GLONASS. Collected data is used for various analyses including timetable development, bottlenecks detection and planning the bus lanes. Until recently we used the PostGIS extension for this purpose but now we are switching to a new PG extension — MobilityDB — designed especially for geodata time series processing. I have compared the table size and the performance of our solution without and with MobilityDB and happy to present the results.
Examples of Sequences' opportunities implementations for developing business-logic powered by stored functions.
High availability PostgreSQL cluster under the control of the Patroni for 1С. Single entry point is organized by Consul DNS on WindowsTalk
200 bases, several clusters, several terabytes of data Share our experience setting up and using patroni cluster DBMS Cluster on Linux, 1C server for windows. We use: PostgreSQL assembly for 1C, Patroni, Consul, Consul dns, Commvault, Ansible Vagrant file and Ansible playbook with roles attached.
Sometimes it maybe a very tricky just to establish a connection with your remote database. Especially if your working place is in highly secured corporate environment and your database server is in highly secured remote cloud.
We will talk about bypassing SSH, SSL, jump servers, proxy servers, VPNs and various SSO (Kerberos/SSPI/LDAP/Active Directory) authentication systems in order to access your PostgreSQL database using console or rich UI interface and JDBC. We will describe different use cases - from trivial and typical to insane and over-secured.
Can the old dog be taught new tricks? The simple and reliable tool pg_dump has never been so convenient as it is now. The story about how the good old pg_dump became a powerful tool for production and a great assistant for DBA. New features, benefits and cases of their use.
In the report I will talk about how Zabbix appeared in our company, how we lived with Oracle, why we decided to switch to PostgreSQL, what problems we had during the transition, and what software seemed like a good choice for working with PostgreSQL.
In the report, we want to talk about the experience of migrating one large system from Oracle to PostgreSQL. The system itself was built on the PHP + Oracle stack; its distinguishing feature was that all business logic was implemented in PL / SQL code. In a DBMS, there are more than 3000 packages with 4-10 functions (procedures) in each. In PHP - more than 10,000 forms with inserts of anonymous blocks used to receive data, process and save results in Oracle.
To solve this extremely voluminous work, we took ANTLR4 (a powerful parser generator for reading, processing, executing, or translating structured text), PL/SQL grammar, and created an automatic converter that allows you to convert all objects in the schema and our system from Oracle to working code for PostgreSQL.
One of the most widespread branches of PostgreSQL distribution in Russia is 1C. In the report I would like to share thoughts about what is missing in PostgreSQL according to 1C
A brief story how MySQL->PG migration could increase company efficiency tenfold times:
- Program code has been reduced 50 times, with optimization of backend team (from 15 to 3 engineers)
- Software development of new features has become measuring in days, not in months
- Infrastructure costs per 1M users have been reduced 20 times
- Database structure and technical documentation were simplified significantly, from 100K high-dependent tables to just 20 simple tables
- New security level because of total forbidden on external SQL commands to the database
- Quick analytics aggregation on multiple parameters, without external analytics systems
- The last, but not the least: the main business was keeping alive during migration
Many businesses which use Database management systems like Oracle, DB2 & MS SQL are unreliable these days. Moreover, the costs incurred in maintaining these systems and its product licenses keeps on increasing. As the competitors are migrating over to the new technologies and tools available in the market, it is necessary for these businesses to migrate to new environment which is efficient, consistent and reliable to stay in the market and the technologies used in the current environment have become obsolete or no longer serve the business purpose. PostgreSQL has emerged as a top open-source RDBMS software. Since there is no licensing cost associated with it most of the companies are planning to migrate the databases which are currently running on other RDBMS like Oracle, DB2, MS SQL server to PostgreSQL. This report summarizes the various methodologies, procedures and techniques involved in successfully migrating the data from Oracle to PostgreSQL & DB2 to PostgreSQL. Migration is not a simple effort there should be proper planning and testing involved in this right from database connectivity to performance analysis. In this paper we are going to cover most of the steps which we need to consider before the migration and after the migration like choosing the correct tools for implementing the migration, time taken to migrate ,data compatibility, code conversion, application connectivity to database, database configuration parameters, performance analysis, replication setups, database monitoring, patching and backup strategies.
Tracking poor queries is an infinity quest for developers, who works with databases. Often we think that it is guilty the slow and big queries. But what to do if we do not have that kind of queries between backend and database? Which kind of queries should we looking for? What tools should be used for that work? This talk will cover all these questions.