How to migrate a high-loaded information system based on 1С:Enterprise from MS Windows/MS SQL Server to Linux/PostgreSQL and don't face considerable pains?
We'll share our migration strategies and what was key to its success. We'll talk about projects involving migrations and pay attention to some nuances.
The story on the development of a database change management solution and what we lacked in Liquibase and FlywayTalk
We will tell you how we store database objects in the version control system. Then we compare the traditional approaches to manage database migrations and the one we preferred. We will talk about both methods advantages and disadvantages. Finally, we will present our change management solution - pgmig.
I will share the experience of migration project of an information system from a Microsoft SQL Server DBMS to Postgres PRO. The owner of the DBMS is Rosagroleasing JSC.
The volume of the DBMS database is about 6 TB (more than 300 tables, up to 4 billion records in each). The team adapted 15 ETL packages and 131 procedures with full functionality and also partial refactoring of the source code. Systems' performance has remained the same.
Why would you need the domestic DBMS? 1. Because of law requirements: only domestic DBMSes are appropriate for data protection. 2. In terms of formalization: currently, it is impossible to purchase MS in Russia legally. 3. In terms of pricing: the licensing mechanism of foreign DBMS systems significantly increases the expenditures on protection. 4. In terms of performance: it depends on the optimization of information protection tools, which are currently developed for domestic DBMS systems (in order to reveal why you may refer to the first paragraph). 5. In terms of practice: by the end of last year we’ve finished transition to the domestic databases and everything works smoothly. What else does an information security officer need?
Since PostgreSQL 9.5 logical replication has replication origin infrastructure for replication progress tracking. Now 16th version is coming with feature which made bi-directional replication setups possible using said infrastructure. In earliest reviews it got a nickname of "native multimaster" for cluster, interconnected with said replication, bearing some resemblance to PostgresPro's Multimaster. We're going to figure out degree of it and explain insides of them. Talk expects you to have a basic knowledge of WAL insides and it's role in PostgreSQL functioning.
Experience in porting the production management system database from Oracle DBMS to PostgresPro DBMS in a manufacturing enterpriseTalk
The practice of transferring structure, logic and data from Oracle DBMS to PostgresPro DBMS. Features and main difficulties of migration. Advantages of PostgresPro in terms of porting logic.
Three years ago we successfully migrated our application from Oracle DBMS to vanilla PostgreSQL. After that we faced with "childhood" diseases of our application on the new DBMS, which were successfully cured, and some "chronic diseases", which took much longer to deal with. One of the most memorable problems was the problem of DB performance degradation. The source of this problems was insufficient vacuuming of our database. The experience of understanding and solving this problem is offered to your attention in the form of practical recommendations to prevent DB tables and indices bloat and setting up PostgreSQL VACUUM/autovacuum parameters.
In this presentation, we cover multi-transactions, explain when they appear and how they can be monitored and checked. We also tackle the possible issue with multi-transaction ID wraparound and the problem of file absence in pg_multixact/offsets when restoring from a backup.
I will talk about the common mistakes when setting up a Corosync-Pacemaker failover cluster. Often these mistakes lead to fatal consequences, and, as a result, to the rejection of the chosen solution in favor of others. Would you like to get a recipe for the "right" cluster?
Experience of migration a highly loaded system from DB2 for z/OS to PostgresPro in multimaster configurationTalk
The developer shares his experience of migration a highly loaded system in the field of railway transport from IBM DB2 for z/OS to PostgresPro in a multimaster configuration. The main focus is on the nuances of working with a cluster operating in multimaster mode, what problems had to be faced and how to solve them.
The history of import substitution did not start last year after well-known events. Its launch dates back to 2014. It was from this year that state and near-state companies began to think of switching to the so called "recommended software". One of these companies approached us back in 2020 with a project to move from Oracle to PostgreSQL. This project was designed to solve the accumulated architectural problems (imperfect storage system for telemetry data, the DBMS itself worked inside a virtual machine), and optimize the use of disk space (make space in the main storage, debug saving archived data, ensure correct backup). Since the customer's system should have worked uninterrupted 24/7, it was necessary to switch from one DBMS to another "seamlessly" without downtime, with simultaneous operation of both to ensure step-by-step translation of subsystems and the ability to control the correctness of the data. And, of course, the work had to be completed as quickly as possible.
In the report we will discuss how we managed to solve this case.
PostgreSQL backup process in one of the biggest issues you face after switching to PostgreSQL. PostgreSQL has two out-of-the box options for backup and restore, they are dump/restore and pg_basebackup. Both of them have their nuances and peculiarities radically distinguish the backup and recovery system from MS SQL. Another option in the PostgreSQL ecosystem is the pg_probackup utility which is being actively developed and has a number of backup and restore variants offering different features. Each of these options maybe good or not so good for a particular scenario. In this talk, I'd like to cover the nuances, peculiarities and best practices for large databases, hundreds of midsize databases and small database setups.
As we have large 1C+PostgreSQL database setups, questions often arise, the answers to which are not so easy to find even in the documentation. I would like to share my experience in solving such issues gained during several migrations of 1C from MS SQL to PostgreSQL made for clients from the RBC500 rating. This talk will cover the following points: How in-depth your statistics should be and why it can be dangerous? How can the creation of an explicit and implicit temporary table provoke the DBMS server downtime and how can we avoid it? In what case will the DBMS process be killed by the operating system due to overuse of RAM and what to do about it? Why having one database per cluster is good and why having multiple databases on one DBMS cluster is bad? If we are stick to the "1 cluster = 1 database" approach, what about server resources for test and development environments? Backups in PostgreSQL: what backup options we have
A popular lecture with pictures illustrating the most powerful phenomena in the Universe that put space and time in an uncomfortable position and force them to reveal the secrets of the Universe to those who have an inquisitive mind and Postgres! The story of Mr. Lipunov, professor at Moscow State University, author of world-famous popular and scientific books, theorist, experimenter and creator of a network of astronomical robots.
Primary key generation is a known issue quite efficiently though not ideally solved by sequences. There are still problems with distributed generation, real uniqueness and predictability. Initially, UUID was not intended for use as a DBMS key, but its new versions (6, 7 and 8) make it quite convenient to use them as both synthetic and natural keys.
My presentation "Explaining the Postgres Query Optimizer" covers the details of query optimization, optimizer statistics, joins, and indexes. This talk covers 40 other operations the optimizer can choose to handle complex queries, large data sets, and to enhance performance. These include merge append, gather, memoize, and hash aggregate. It explains their purpose and shows queries that can generate these operations.
This is a new talk; draft slides are at https://momjian.us/main/writings/pgsql/beyond.pdf
- Scylla Charybdis of project management;
- Sirens of personal goals:
- Circe for IT professionals:
- Polyphemus of information security.
The tricky part of taking the journey in less than ten years.
Every DBA has in one way or another experienced the situation when PostgreSQL tables and indexes grow significantly in size. While looking for the reason for such behavior, we often conclude that database objects have "bloated". In this talk, we'll discuss the reasons behind bloating, create a testing environment to define the best method to reduce bloating. We'll also compare several anti-bloating utilities and get familiar with one more tool that helps us to efficiently struggle against bloating. We expect this presentation to become helpful for PostgreSQL DBAs of any experience level.
The developers of pg_probackup will explain how to work with a variety of pg_probackup commands. They will also guide you through making FULL and incremental backups in DELTA, PAGE, and PTRACK modes. Configurations of backup policies for different fields from a hundred of simple DBMSs to large e-commerce and fintech setups will also be demonstrated.
April 8, 2023 is the expected feature freeze date for PostgreSQL 16.
Some of the new features are already known, particularly, we can expect interesting changes related to security and logical replication. However, announcing all interesting patches before the end of the March CommitFest is impossible. Based on the experience of previous releases, we can assume that the most interesting patches will be committed during late March and early April..
That's why the contents of this talk will remain unknown till the very last moment, the author of this talk will also discover it in due time.
In this talk I’ll show some new abilities in postgres observability. There will be Graphana visualization of pg_profile/pgpro_pwr data, manual highlighting of interesting objects in pg_profile/pgpro_pwr reports, further development of extended vacuum statistics in PostgresPro releases and a new session and query tracing technique in pgpro_stats extension
The importance of sharding is now colossal. The size of today's databases exceeds 100 terabytes, scaling up vertically, and adding replicas containing a complete physical copy of the database is becoming difficult, especially when computational resources are scarce. Database sharding is a way to scale horizontally by partitioning data between compute nodes that are independent of each other.
In the PostgreSQL world, there are well-known scaling tools - CitusDB, and Greenplum - as well as new generation solutions - Cockroach DB, Yugabyte DB, SPQR, and Shardman.
In this talk, we will discuss the difference between these implementations, the advantages, and disadvantages of these solutions, examine the current state of implementation of sharding in vanilla PostgreSQL, and also touch on another important topic - providing data integrity and consistency guarantees across a distributed cluster.
Pgpool-II in the "Master-Master" mode or Network engineer’s overview of the PostgreSQL load balances solution.Talk
One of the main needs of high-load database projects is a right configured load balancer within the database cluster. Most of the solutions I worked with, including Pgpool-II, could not fully suit business requirements. In this article me as a network engineer with broad experience will cover tricky configurations of the Pgpool-II operating in the “Master-Master” mode, as well as consider cases in which similar solutions are indispensable.
-----------------------------------------------------------QUERY PLAN-------------------------------------------------------------- Hash Join Hash Cond: (Subject = GEQO) -> Hash Join Hash Cond: (**Optimizer task = choose the best query execution plan**) -> Seq Scan on **The number of potential plans grows exponentially as the number of tables in a query increases** -> Hash -> Seq Scan on **PostgreSQL solves this problem by using the genetic optimizer (GEQO)** -> Hash -> Seq Scan on **Topics of the report:** Filter: (**(What is GEQO)** AND **(Pros and cons)** AND **(How it works)**) (10 rows)
I'll show in PostGIS how everyone can analyze the geodata of the entire Earth and get answers to their global questions in minutes and seconds.
When you use a taxi in small towns, calling a car by phone, then with a high probability your trip is charged by the program based on OSM data. For billing, one of the routing packages is used. Through this use case, the taxi company employees put the house number and street on the buildings and contribute not only to their business, but also to OpenStreetMap.
The data analytics scenario also includes the tasks of where it is better to place an outlet so that buyers come to it. Again, data on walking distance and population of the surroundings can be extracted from geodata. You can calculate the value of real estate based on many factors related to the location of the object and its surroundings.
Scientists can build predictive models to predict epidemics, urban evolution, plan recreational areas and development of existing territories based on open geodata.
Well, you can answer any geography question that comes to your mind: calculate the area of cities and buildings, the length of roads and extract the names of cities, regions and islands. You can, for example, become a champion in the game of "Cities" or establish a new service for renting electric scooters. Everything is limited only by your imagination.
I published https://github.com/igor-suhorukov/openstreetmap_h3 - my project of high performance data loader, which allows you to perform geoanalytics on data from OpenStreetMap in PostGIS. It transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions schema. Columnar storage option activate CitusDB extension in PostgreSQL to speedup aggregation queries.
ORM libraries are widely used in backend development – let's take this for granted. ORM libraries are improving, developers are learning from their own and others' mistakes, and many have learned to write code for ORM that generates quite optimal queries for typical operations. Yet, limited functionality of existing tools is another problem related to ORM. This is especially true for Postgres, which has a lot of operators and constructs (both in the core and in extensions) that go beyond the SQL standard.
With LinqToDb ORM library you can write extensions that implement any SQL operator or construct very easy, literally using just one line. Full-text search, KNN, CTE (including recursive ones), working with arrays and jsonb, array_agg and jsonb_object_agg, table functions and LATERAL JOIN - all this can be easily used in ORM, which I will demonstrate during this talk!
How does network latency affect performance when using synchronous replication? If a primary node fails, how long will our cluster not be reachable? How much data can be lost? Will there be any anomalies when reading from standby? We are designing our network architecture, give us a minimum RTT value for cluster solution! Many DBAs have to face these issues when designing and implementing cluster solutions. In this short talk we will tell how to quickly estimate RTT, RTO, RPO values, with the help of which Open Source tools one can test cluster solution and experimentally verify the design parameters, which metrics values can be used as a guide in design.
The major task of workload monitoring tools is to provide measurable statistics about most resource intensive activities in the DBMS. They can't solve any issues but the points for concentrating optimization efforts can be highlighted. We’ll talk about the extensions pg_profile and pgpro_pwr, their architecture, features and use cases.
With the rise of NoSQL databases, a number of falsehoods have flourished regarding how to choose a database engine. This talk focuses specifically on Redis and PostgreSQL, and why one might choose one or the other.
At small scales, we can often get by thinking of database servers as black boxes, but as we scale, the internals and architecture become more and more important. This talk focuses on behavior of these systems at scale and under load.
In this presentation you will learn:
- How Redis and PostgreSQL differ architecturally
- How differences in architecture affect scalability and performance
- Cases here Redis is the clear winner
- Cases where PostgreSQL is the clear winner
Additionally, some notes will be offered in terms of where PostgreSQL can improve in to compete with the sorts of workloads that generally favor Redis.
A report on our experience of implementing PostgreSQL instead of Oralce in a highly loaded system with a microservice architecture that processes several terabytes of data. Let's talk about how we went from piloting Greenplum to moving to several interacting PosgreSQL databases with different load profiles, and about the problems we encountered in the process.
Historically PostgreSQL was intended to transactional OLTP workload. This thesis is confirmed by row-based kind of storage and impossibility (or some complication) in building distributed engine of query execution based on MPP principles. However, due to extensibility of PostgreSQL core (first of all, by using of pluggable access methods) and tolerant license policy similar to BSD there were appeared new different forks and extensions allowing effective processing of big data in analytical manner.
In current talk I'm going to review the PostgreSQL fork called Greenplum and Citus and TimescaleDB extensions from system developer's perspective by comparing their common analytical engine features: column storage, data compression, distributed query execution and so on. The results of such overview will be helpful to database architects seeking PostgreSQL-based DBMS for analytical workload.
One of the main problems when migrating from Oracle to PostgreSQL is PL/SQL packages.
The PostgreSQL DBMS lacks such functionality - this leads to the need to rewrite (up to its redesign!) the Oracle PL/SQL code to PL/pgSQL.
With the new version of its flagship Postgres Pro 15 Enterpise distribution, Postgres Pro has added support for package functionality: the PL/pgSQL syntax has been extended and new mechanisms have been added to the stored procedure runtime environment. The report will focus on supporting the functionality of packages in Postgres Pro 15 Enterpise: - grouping types of variable procedures and functions into a package; - support for the package initialization section; - support for global package variables available throughout the "life" of the session.
This presentation will also discuss the features of migration of packages from Oracle to Postgres Pro Enterprise DBMS.
Heterogeneous distributed system - as a way to safely migrate from MSSQL Server to PostgreSQL, as well as reduce sanctions risksTalk
This report is primarily intended for companies, specifically their IT departments, that are running Russian systems 1C 8.x and have the ability to work with both MSSQL Server and PostgreSQL. We live in a unique time when having a heterogeneous IT system (a system with a distributed architecture where each database instance runs under different databases and/or has a different data structure) is justified both economically and in terms of potential risks. On the one hand, we store data and use databases with predictable behavior and open source code, regardless of the political situation. On the other hand, with this approach, we take advantage of all the benefits (primarily performance) of a powerful database supported by the largest vendor, albeit a hostile state to us. Now is the time to assess risks with the necessary level of paranoia. After all, data can be corrupted not only at the level of storage logic, but also by the method of additional "hidden" calls of Delete/Update constructs at the engine level of the database. Therefore, it is now relevant not only to monitor the performance of the IT system, but also to audit data and their timely integrity. It is necessary to implement data verification procedures, it is necessary to implement procedures for closing periods, procedures for fault-tolerant storage. And accordingly, to provide for various threat models in data recovery procedures. The report presents options for countering such threats and scenarios for the most seamless transition of large databases to PostgreSQL, since it is precisely for such databases that the problem of translating to a new DBMS is particularly acute.
In my opinion, every DBA has to deal with "heavy" queries. The question is whether we can do something at all to speed up such query. Is it possible to optimize such queries, or maybe it doesn't make sense? For example, rewriting such query may take much longer than getting any performance gain from this rewritten query. During this talk, I'll tackle several approaches to speeding up queries, and demonstrate some practical examples of optimizations that I use at work.
We offer to your attention our experience in data migration and the Ora2PgCopy program written in Java for high-speed data transfer from Oracle to Postgres, which is used after creating tables and transferring the program code of application systems. High data transfer speed is provided by using the Postgres command “copy”, using multithreaded Java technology for file processing, managing the nologged/logged table option, and supporting LOB and CLOB data types. According to the test results, Ora2PgCopy works noticeably faster than such analogues as: Ispirer (convertum), oracle_fdw, ora2pg, Pentaho kettle. Ora2PgCopy can function as a module as part of the LUI4ORA2PG migration automation system or independently of it. The history of the growth of the Live Universal Interface (LUI) web application development tool and the LUI4ORA2PG migration tool can be found in previous presentations at PGConf conferences: https://pgconf.ru/2019/118109 , https://pgconf.ru/201911/264095 , https://pgconf.ru/2020/262456, https://pgconf.ru/2021/288310, https://pgconf.ru/2022/316022.
Of the relational databases, PostgreSQL is fairly unique in the indexing capabilities it offers. While most of us are familiar with the use of indexes to speed performance when filtering on columns, PostgreSQL indexes can do far more than this.
This talk will focus on cases where difficult problems were solved through the creative use of indexes. Each of these cases is from an episode in my career.
In this presentation, you will learn:
- The general index access methods PostgreSQL supports out of the box
- How the planner uses indexes (only on a high level)
- Functional indexes
- Creative use of Index Only scans.
This is a talk for all audiences. The talk is primarily for beginner to intermediate users, but should have enough information for more advanced users to get some insight or inspiration from the talk as well.
There is set of Ansible roles for any Postgres configuration: a standalone instance; a streaming replica; patroni+etcd; stolon+etcd; patroni+consul; stolon+consul. In this talk, we will discuss why using this set of Ansible roles is better than other solutions.