Talks and tutorials
If you care about Postgres performance, there are a number of hardware acceleration options to help with different use cases. Intel Optane DC persistent memory creates new tier in data hierarchy allowing developers to utilize performance of traditional memory combining with volume and persistency of block storage devices. Unlike traditional DRAM-only in-memory systems, where memory is small, expensive, and volatile, Intel Optane DC persistent memory makes it possible to run larger Postgres databases (terabytes) in memory for higher performance. FPGAs are integrated circuits that can be reprogrammed dynamically to accelerate a specific workload such as SQL execution and data compression. FPGA accelerators extend Postgres with hundreds of SQL reader and writer processes that work in parallel on the FPGA. It’s similar to adding hundreds new cores to boost parallel processing on your server.
Moscow public transport vehicles when moving report their coordinates via GLONASS. Collected data is used for various analyses including timetable development, bottlenecks detection and planning the bus lanes. Until recently we used the PostGIS extension for this purpose but now we are switching to a new PG extension — MobilityDB — designed especially for geodata time series processing. I have compared the table size and the performance of our solution without and with MobilityDB and happy to present the results.
When migrating data from one DBMS to another, the question arises: choose a third-party tool or to program the migration yourself? Companies, trying to grow competencies within themselves, choose the second option. And they come across the "invention of their own bicycles". However, the market has powerful free data migration tools. One such tool is Pentaho Data Integration, part of the Pentaho Community Edition. The report will discuss the use of this package for data migration between Oracle and PostgreSQL. Particular attention will be paid to the problems with using this tool, and to the tasks of testing for the completeness and integrity of migrated data.
I would like to present the main projects for the evolution of our database, how we execute the administration, the problems and pitfalls we found, and how we solve them,the number and how are the database clusters from Gitlab.com , and what is our planning for the future, sharding, kubernetes... Our environment is in an exponential growth, with millions of users and thousands of requests per second, and we keep our platform stable and scaling. Join our session and discover our how we are doing it!
Postgres has always had strong support for relational storage. However, there are many cases where relational storage is either inefficient or overly restrictive. This talk shows the many ways that Postgres has expanded to support non-relational storage, specifically the ability to store and index multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage. The talk will cover many examples of multiple-value-per-field storage, including arrays, range types, geometry, full text search, xml, json, and records.
PostgreSQL 13 Feature Freeze is scheduled for April 2020. Two more commitfests are still accepting new patches. What we can say about PostgreSQL 13. It's possible that rotation rule will work so that new release wouldn't have as many new features as PostgreSQL 12 have. If even it is do, that would be good evolutionary release with a lot of medium feature and infrastructure changes, which prepares postgres for new leap. In this talk I will overview expected novelties in PostgreSQL 13. It would be more or less accurate, since there would be only one commitfest left, which results are possible to forecast.
To build a decent query plan, the optimizer has to understand statistical characteristics of underlying data. It is interesting to observe how the structure of the collected information became more complicated over time: what the optimizer relied on back in its early days and what is at his disposal now with the release of the 12th version. We will also talk about how and when statistics are collected, how to manage this process and whether it is necessary to think about it at all.
Last year I made a talk about unexpected PostgreSQL bottlenecks, which could make sad surprise to user (or DBA). Feedback to my talk was very positive. Additionally I have new material after year. This is why I'm making a sequel including new unexpected situations when your database hangs. This time focus will be on multicore hardware platforms, but not only them.
The story about powering a 1.5 petabyte analytics application with 2816 cores and 18.7 TB of memory in the Citus cluster at the Microsoft. The Windows team measures the quality of new software builds by scrutinizing 20,000 diagnostic metrics based on data flowing in from 800 million Windows devices. At the same time, the team evaluates feedback from Microsoft engineers who are using pre-release versions of Windows updates. At Microsoft, the Windows diagnostic metrics are displayed on a real-time analytics dashboard called “Release Quality View” (RQV), which helps the internal “ship-room” team assess the quality of the customer experience before each new Windows update is released. Given the importance of Windows for Microsoft’s customers, the RQV analytics dashboard is a critical tool for Windows engineers, program managers, and execs.
Any DBA needs some kind of tool for historical workload analyse. Assume once at morning your monitoring team will report of sudden performance degradation at 2-3 a.m., and now you need to investigate this issue. What activities was most resource consuming within that hour? There are several tools for solving this problem, and I'll talk about one very easy and convenient tool - pg_profile. It need only a postgres database and a cron-like tool to run, and it will generate a workload profile report for your database as you need it. Ths report will be a good start point for further investigation.
MobilityDB is an open source moving object database system (https://github.com/ULB-CoDE-WIT/MobilityDB). Its core function is to efficiently store and query mobility tracks, such as vehicle GPS trajectories. It implements the Moving Features specification from the Open Geospatial Consortium (OGC). MobiltyDB is engineered up from PostgreSQL and PostGIS, providing spatiotemporal data management via SQL. It thus integrates with the postgreSQL eco-system allowing for complex architectures such as mobility stream processing and cloud deployments.
The presentation will explain the architecture of MobilityDB, its database types, indexes, and operations. We will highlight the PostgreSQL features that enable this extension, and the would like to have features. This presentation will be of special interest to the PostgreSQL community, and to professionals in the transportation domain.
In my talk, I will tell how we built a geographically distributed system of personal data storage based on Open Source software and PostgreSQL. The concept of the inCountry business is to provide customers with a ready-to-use infrastructure for personal data storage. Our business customers are ensured that their customer’s personal data is securely stored within their country’s borders. We wrote an API and SDK and built a variety of services. Our system complies with generally accepted security standards (SOC Type 1, Type 2, PCI DSS, etc.). We built our infrastructure with Consul, Nomad, and Vault, used PostgreSQL, ElasticSearch as a storage system, Nginx, Jenkins, Artifactory, other tools to automate management and deployment. We have assembled our development and management teams - DevOps, Security, Monitoring, and DBA. We use both cloud providers and bare-metal servers located in different regions of the world. Development of the system architecture and ensuring the stability of the infrastructure, consistent and secure operation of all its components is the main task facing our teams.
This talk discusses the open source components we use at Adjust to manage a massive number (5+PB) of application log messages on PostgreSQL in a massively multi-parallel way. It provides both a use case for PostgreSQL in a big data (high volume/velocity/variety) environment, and can be used to show the power of PostgreSQL with JSONB, GIN, and more.
This talk covers the capabilities of the components in depth, sufficient to inspire similar solutions.
Odyssey is a new PostgreSQL connection pooler designed for the high-load environment. It can significantly scale processing performance by specifying the number of additional worker threads. This allows to increase SSL/TLS performance and provide more balanced pooling control using by sharing global server connection pools. In this talk I will answer 4 questions: 1. When you should use connection pooler? When Odyssey is the most optimal choice? 2. How to set up a cluster with Odyssey? 3. How to sleep well and be sure that everything works? 4. How does the architecture of Odyssey look like? How to implement or propose feature that you want?
Locking is critical for providing high concurrency for any database — you cannot fully utilize your hardware if locking is throttling its use. This talk explores all aspects of locking in Postgres by showing queries and their locks; covered lock types include row, table, shared, exclusive, and advisory lock types. The high concurrency provided by Multiversion Concurrency Control (MVCC) is also covered.
Slides are at https://momjian.us/main/writings/pgsql/locking.pdf
I like stored procedures - it is great technology. But like any other technologies it allows to write not well optimized code. It is not easy to write optimized code, sql statements in complex large applications. On second hand, there are some tools, that can be used very easily, that can help. Postgres has built-in tracking functions possibility. There are PLProfiler and plpgsql_check. With these tools is easy work to detect slow part of applications.With this knowledge, the fix of performance issue is less magic.
At the master class, we will learn how to execute fast load tests of Postgres databases: optimizing database configurations, data structure, indexes, OS settings, etc. To do this, we will create a code, build the infrastructure for the test from it and will do the test. As a result, we get a flexible tool in the code to which you can attach any monitoring, and for which you don't have to pay a lot of money, because the environment is created in 7 minutes in an empty AWS account and destroyed after test
Can the old dog be taught new tricks? The simple and reliable tool pg_dump has never been so convenient as it is now. The story about how the good old pg_dump became a powerful tool for production and a great assistant for DBA. New features, benefits and cases of their use.
Examples of Sequences' opportunities implementations for developing business-logic powered by stored functions.
I'll show an example of solving the problem of searching "similar" texts for one given text in big array using GiST index. The problem itself is not much important, but it is easy to understand. Using this problem as example, I'll show one of many methods of adapting GiST index for custom search problems. Maybe this talk will help you to solve other search problems.
pg_repack is one of the most popular instruments for removing bloat of tables and indexes in Postgres. In most cases, it works perfectly. But if you use such a feature of Postgres as deferred constraints, using pg_repack becomes more difficult or even impossible. I will talk about how we encountered the problem and will describe some workarounds - from internal instruments of Postgres to a small patch for pg_repack.
Sometimes it maybe a very tricky just to establish a connection with your remote database. Especially if your working place is in highly secured corporate environment and your database server is in highly secured remote cloud.
We will talk about bypassing SSH, SSL, jump servers, proxy servers, VPNs and various SSO (Kerberos/SSPI/LDAP/Active Directory) authentication systems in order to access your PostgreSQL database using console or rich UI interface and JDBC. We will describe different use cases - from trivial and typical to insane and over-secured.
In the report I will talk about how Zabbix appeared in our company, how we lived with Oracle, why we decided to switch to PostgreSQL, what problems we had during the transition, and what software seemed like a good choice for working with PostgreSQL.
Greenplum is a horizontal scalable database based on PostgreSQL core. It is used for OLAP workloads and a standard task is to quickly load or unload large amounts of data. The external data resources are usually another distributed systems. In this talk I will show how Greenplum can work with external data. What is the architecture and performance of external and foreign tables and how to handle streaming data. What will be changed in the next version. The examples of Kafka and ClickHouse connectors development.
Many businesses which use Database management systems like Oracle, DB2 & MS SQL are unreliable these days. Moreover, the costs incurred in maintaining these systems and its product licenses keeps on increasing. As the competitors are migrating over to the new technologies and tools available in the market, it is necessary for these businesses to migrate to new environment which is efficient, consistent and reliable to stay in the market and the technologies used in the current environment have become obsolete or no longer serve the business purpose. PostgreSQL has emerged as a top open-source RDBMS software. Since there is no licensing cost associated with it most of the companies are planning to migrate the databases which are currently running on other RDBMS like Oracle, DB2, MS SQL server to PostgreSQL. This report summarizes the various methodologies, procedures and techniques involved in successfully migrating the data from Oracle to PostgreSQL & DB2 to PostgreSQL. Migration is not a simple effort there should be proper planning and testing involved in this right from database connectivity to performance analysis. In this paper we are going to cover most of the steps which we need to consider before the migration and after the migration like choosing the correct tools for implementing the migration, time taken to migrate ,data compatibility, code conversion, application connectivity to database, database configuration parameters, performance analysis, replication setups, database monitoring, patching and backup strategies.
Porting applications from Oracle to Postgres is common work today. Unfortunately it is not without problems. In presentation I'll try to show the basic performance problems related to differences between Oracle and Postgres and PL/SQL and PL/pgSQL.
Join Heterogeneous Databases using PostgreSQL Foreign Data Wrappers
To Eat "the Elephant" in chunks: how we made friends with MSSQL, Postgres, wrote our replication, and transferred to Postgres one of the largest MISes in Russia.Talk
It is our experience of the medical information system "RT MIS" transfer from MSSQL to PostgreSQL . When the necessity of transfer to PostgreSQL in our "RT MIS", one of the largest medical information systems, became imminent, we felt really terrified having assessed its amount: there was a huge number of stored procedures, functions, SQL-queries in its application code and services. It all requested transcribing, was exacerbated by demands on the system accessibility. So the variant "we awoke in the morning and PostgreSQL was working everywhere" was definitely impossible. That is why we chose another way: began eating "the elephant (PostgreSQL)" in chunks.
In my report, I am going to share our practical experience of the transfer, the instruments we used, the reason for another replication, the problems we met and their solutions. And finally, what turned out to be better: PostgreSQL or MSSQL.
Everyone has probably heard about such a service as AWS RDS. I will talk about my experience by using the AWS RDS PostgreSQL Engine: the positive and negative aspects of the DBA work. This talk will focus on the tools that helped me create a comfortable environment in RDS. https://www.dropbox.com/s/v7udx5x96as5gbd/pgconf2020.pdf
· Enterprise network traits, known problems and workarounds.
· Year-long background of maintaining 24x7 1C + PostgreSQL systems
· Pros and cons in comparison with MS SQL from DBA point
· Stories of 1C systems migration from MS SQL to PostgreSQL
In Zabbix-server 4.4, a new Zabbix-agent has become available. It is written in Golang, has a plug-in structure, keeps a permanent connection and makes it possible to control metrics in runtime. We will make an overview of the Zabbix-agent 2 and talk about how to develop plugins for it, in particular, the PostgreSQL monitoring plugin developed by Postgres Professional in collaboration with Zabbix.
While creating the infrastructure for a large blockchain ecosystem, we encountered PostgreSQL performance problems when inserting a large amount of data. So we decided to migrate the data to Clickhouse. It brought us some happy months, until the first users of the system finally came in and it was time to give the data back. It forced us to migrate back to PostgreSQL, this time in a much better thought-through storage scheme. In a pursue to get the most out of each technology, we learned a lot of interesting quirks about Clickhouse and PostgreSQL and their relevant optimization techniques. We found PostgreSQL to be much more flexible for a broader range of load profiles.
Cloud storage has some unique characteristics compared to traditional storage mainly because it is virtualized and controlled by software. One example is that AWS EBS shows higher throughput with larger I/O size up to 256 KiB without hurting latency. Hence, a user can get only about 4 MiB/sec with 1,000 IOPS EBS volume if the I/O request size is 4 KiB, whereas a user can get about 250 MiB/sec if the I/O request size is 256 KiB. This is because EBS consumes one I/O in a given IOPS budget for every I/O request regardless of the I/O size (up to 256 KiB). Unfortunately, PostgreSQL cannot exploit the full potential of cloud storage because PostgreSQL has designed without considering the unique characteristics of cloud storage.
In this talk, I will introduce the AppOS extension that improves the throughput of a write-intensive workload by 10x by transparently making PostgreSQL cloud storage-native. AppOS works like a storage driver that efficiently exploits the characteristics of cloud storage, such as I/O size dependency to storage throughput and latency, atomic write support in cloud block storage, and fast, but non-durable local SSDs. To do this, AppOS comprises a Linux-compatible file I/O stack including virtual file system, page cache, block I/O layer, cloud storage driver. On top of the file I/O stack, syscall module supports registering pre- and post-handler for file I/O-related system calls in order to transparently work without modifying PostgreSQL codes.
I will focus on presenting key use cases and performance results of the AppOS extension after explaining the internals. Specifically, I will show the performance results of OLTP and some batch workloads using standard benchmarking tools like pgbench and sysbench. I will also present performance results and implications on multiple clouds including AWS, GCP, and Azure.
PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. The default configuration is designed in such a way that PostgreSQL can run on any system using minimum resources. Consequently, a default installation of PostgreSQL does not give optimum performance on high-performance machine because it is set up to use all available resources. PostgreSQL provides mechanisms that allow you to tune your database according to your workload and machine specification. Outside of PostgreSQL, though, we can tune the Linux kernel to allow the database load to work optimally. In this talk, we will learn how to tune some of the PostgreSQL’s parameters, and we will see the effect of that tuning, but we will focus on demonstrating how to tune Linux for better Postgres performance. As there are so many Linux kernel parameters that can be tuned to improve the performance of PostgreSQL, I will also share the results of benchmarks obtained when tuning some of the Linux parameters.
Tracking poor queries is an infinity quest for developers, who works with databases. Often we think that it is guilty the slow and big queries. But what to do if we do not have that kind of queries between backend and database? Which kind of queries should we looking for? What tools should be used for that work? This talk will cover all these questions.
Corosync & pacemaker is a well known solution for creating fault-tolerant clusters. Such clusters can contain 3 working nodes or 2 working nodes and one voting-only node. The cluster can be deployed on physical or virtual servers.
This tutorial will demonstrate the process of installation and tuning of a PostgreSQL fault-tolerant cluster. You will learn that it is not so difficult as seems to be from the first glance.
Kubernetes is the new way of deploying software, programmatically, on almost any infrastructure (be it cloud or on-prem). But is a complex beast. How to get started? How to dive deeper? What are the specific best-practices and special hints for Postgres DBAs dealing with Kubernetes? Join this half-day tutorial to learn, practically, among other topics:
- How to quickly get started with Kubernetes
- Manage storage
- Manage services, networking and ingress/egress
- How to make Postgres cloud-native in Kubernetes
- Do a show-run of existing Postgres operators, including Zalando, CrunchyData and StackGres.
Recently, I was working on a project where graphQL was used for sending requests to its .NET Core backend, but this was not a good idea. The point is, a graphQL query is a hierarchical structure with a dynamic set of fields. It’s difficult to perform such requests via a statically-typed programming language and a relational database as suggested by the tools available. So, I came up with the idea of using the plv8 extension and perform graphQL queries right on the database side. It took me about two hours to develop a working prototype that could perform the same queries as the software under development for more than one month! Then various improvements have been made and I want to introduce them all. If you are thinking of using graphQL instead of REST, my speech could be most useful and could help you to save a lot of time.
High availability PostgreSQL cluster under the control of the Patroni for 1С. Single entry point is organized by Consul DNS on WindowsTalk
200 bases, several clusters, several terabytes of data Share our experience setting up and using patroni cluster DBMS Cluster on Linux, 1C server for windows. We use: PostgreSQL assembly for 1C, Patroni, Consul, Consul dns, Commvault, Ansible Vagrant file and Ansible playbook with roles attached.
Open-source maintainers face many challenges as projects grow. How to write more required features, fix more issues and have time to watch more pull requests? On the example of WAL-G(backup tool for PostgreSQL) I will tell you about how we solved these problems by launching a course of Open-source development at Ural Federal University, what we achieved and what will we do next.
Walk-through of extending PostgreSQL with a user-defined type. The journey begins from the basics, from creating simple domain types over existing types, and continues to implementing a full-blown datatype from scratch in C.
PostgreSQL's advanced index types, GiST, GIN, and SP-GiST, are covered in enough detail to give an understanding of what each of them is good for. Support functions for each of them are shown for the example 'color' datatype.
In the report, we want to talk about the experience of migrating one large system from Oracle to PostgreSQL. The system itself was built on the PHP + Oracle stack; its distinguishing feature was that all business logic was implemented in PL / SQL code. In a DBMS, there are more than 3000 packages with 4-10 functions (procedures) in each. In PHP - more than 10,000 forms with inserts of anonymous blocks used to receive data, process and save results in Oracle.
To solve this extremely voluminous work, we took ANTLR4 (a powerful parser generator for reading, processing, executing, or translating structured text), PL/SQL grammar, and created an automatic converter that allows you to convert all objects in the schema and our system from Oracle to working code for PostgreSQL.
I will speak about our experience in solving the fault-tolerance problem for PostgreSQL, which options we reviewed and how we chose Patroni.
I will cover solution testing, quick implementation to the production and issues we faced as well as the way we solved them.
One of the most widespread branches of PostgreSQL distribution in Russia is 1C. In the report I would like to share thoughts about what is missing in PostgreSQL according to 1C
In May 2019 «Postgres Professional» launched the PostgreSQL Certification Program. Now you can not only get training in our courses, but also confirm your knowledge by passing an appropriate set of tests.
In this presentation, I will share the results of the first months of the program and answer the most frequently asked questions:
Where can I get information about the certification program? PostgreSQL or Postgres Pro certification? Certificates for PostgreSQL version 10. When will you upgrade to the current version? Do I have to attend training centers? Can I pass the test remotely? Not in Moscow? Is it possible to take several tests at once in one day? How to prepare for the test? Are there any examples of training questions? During the test I got an incorrect question, what should I do? How to find out exactly which questions the answers were not correct? When can I retake after an unsuccessful attempt? How to appeal the test results?
A brief story how MySQL->PG migration could increase company efficiency tenfold times:
- Program code has been reduced 50 times, with optimization of backend team (from 15 to 3 engineers)
- Software development of new features has become measuring in days, not in months
- Infrastructure costs per 1M users have been reduced 20 times
- Database structure and technical documentation were simplified significantly, from 100K high-dependent tables to just 20 simple tables
- New security level because of total forbidden on external SQL commands to the database
- Quick analytics aggregation on multiple parameters, without external analytics systems
- The last, but not the least: the main business was keeping alive during migration
Will be demonstrated in the master-class the following: 1). The benefits of declarative programming. Rapid creation of complex LUI-forms using only SQL. 2). Support for long transactions and automatic locks. An example of a wizard with a record directly in the database. 3). The multilingual fields and columns. Rapid inclusion of multilingualism in existing forms. 4). Control the coloring of data in fields and columns of the form. Typical examples. 5). Live search in input fields using the example of address input. 6). How to configure the LUI to enter and display of application-specific data. 7). Design form Master Detail with retrieving data from two different databases. 8). How end-user builds of complex data queries without using SQL.
PostgreSQL is one of the most readily extensible databases in the world. Custom data types, aggregations, functions, and more can be easily and safely written in C.
This hands-on tutorial covers the basics of writing functions, data types, and aggregates in C. It is recommended (though not strictly required) that bring their own laptops and be prepared to actually try the exercises.
Topics covered include: 1. The PostgreSQL type system 2. Creating a simple custom type in C 3. Common errors for non-C programmers 4. Creating a simple aggregate in C
There will be many opportunities for questions and discussion through the tutorial.
You will have a fascinating journey through PostgreSQL settings. We will talk about that with 1C not so or on the contrary so that under it it is necessary to adjust specially DBMS. We will discuss approaches to testing the speed of 1C. Consider the various options for backup schemes and fault tolerance. In the process, we will compare the speed of 1C on PostgreSQL configured by default with the speed of 1C configured for PostgreSQL. Also, we will create a replica of PostgreSQL, and switch to it the 1C Server "live", see what users will see 1C during this operation. And a separate block will be devoted to Postgres Pro Enterprise Edition, how the use of its advantages affects the speed of 1C.