If you care about Postgres performance, there are a number of hardware acceleration options to help with different use cases. Intel Optane DC persistent memory creates new tier in data hierarchy allowing developers to utilize performance of traditional memory combining with volume and persistency of block storage devices. Unlike traditional DRAM-only in-memory systems, where memory is small, expensive, and volatile, Intel Optane DC persistent memory makes it possible to run larger Postgres databases (terabytes) in memory for higher performance. FPGAs are integrated circuits that can be reprogrammed dynamically to accelerate a specific workload such as SQL execution and data compression. FPGA accelerators extend Postgres with hundreds of SQL reader and writer processes that work in parallel on the FPGA. It’s similar to adding hundreds new cores to boost parallel processing on your server.
Moscow public transport vehicles when moving report their coordinates via GLONASS. Collected data is used for various analyses including timetable development, bottlenecks detection and planning the bus lanes. Until recently we used the PostGIS extension for this purpose but now we are switching to a new PG extension — MobilityDB — designed especially for geodata time series processing. I have compared the table size and the performance of our solution without and with MobilityDB and happy to present the results.
Any DBA needs some kind of tool for historical workload analyse. Assume once at morning your monitoring team will report of sudden performance degradation at 2-3 a.m., and now you need to investigate this issue. What activities was most resource consuming within that hour? There are several tools for solving this problem, and I'll talk about one very easy and convenient tool - pg_profile. It need only a postgres database and a cron-like tool to run, and it will generate a workload profile report for your database as you need it. Ths report will be a good start point for further investigation.
This talk discusses the open source components we use at Adjust to manage a massive number (5+PB) of application log messages on PostgreSQL in a massively multi-parallel way. It provides both a use case for PostgreSQL in a big data (high volume/velocity/variety) environment, and can be used to show the power of PostgreSQL with JSONB, GIN, and more.
This talk covers the capabilities of the components in depth, sufficient to inspire similar solutions.
I would like to present the main projects for the evolution of our database, how we execute the administration, the problems and pitfalls we found, and how we solve them,the number and how are the database clusters from Gitlab.com , and what is our planning for the future, sharding, kubernetes... Our environment is in an exponential growth, with millions of users and thousands of requests per second, and we keep our platform stable and scaling. Join our session and discover our how we are doing it!
We will discuss stored procedures and functions, as well as the code within the database in general. We'll take a look at stored procedures from different angles: as a DBA, a database developer, and a backend developer. We will try to understand the specifics of stored procedures that often go against the common coding guidelines, and discuss best practices for working with stored procedures.
This is mainly a motivational talk for database and backend developers, not an advanced tech talk.
Locking is critical for providing high concurrency for any database — you cannot fully utilize your hardware if locking is throttling its use. This talk explores all aspects of locking in Postgres by showing queries and their locks; covered lock types include row, table, shared, exclusive, and advisory lock types. The high concurrency provided by Multiversion Concurrency Control (MVCC) is also covered.
Slides are at https://momjian.us/main/writings/pgsql/locking.pdf
PostgreSQL 13 Feature Freeze is scheduled for April 2020. Two more commitfests are still accepting new patches. What we can say about PostgreSQL 13. It's possible that rotation rule will work so that new release wouldn't have as many new features as PostgreSQL 12 have. If even it is do, that would be good evolutionary release with a lot of medium feature and infrastructure changes, which prepares postgres for new leap. In this talk I will overview expected novelties in PostgreSQL 13. It would be more or less accurate, since there would be only one commitfest left, which results are possible to forecast.
From the very beginning, PostgreSQL in Avito has been solving very important tasks. All the main architecture components are built around DBMS. For over 10 years, the project has been actively evolving, and the infrastructure and architecture have changed a lot.
The talk will start with an overview of how PostgreSQL infrastructure and architecture have advanced in Avito over time and which challenges have been successfully resolved.
Then we'll discuss PostgreSQL usage scenarios in Avito in 2020: microservice architecture, sharding, hosting multiple databases on a single server instance, DBaaS (Database discovery, access control, failover, backup, archive, resource sharing, etc.), integration, and team evolution.
And finally, I'll share our backlog and wishlist.
The story about powering a 1.5 petabyte analytics application with 2816 cores and 18.7 TB of memory in the Citus cluster at the Microsoft. The Windows team measures the quality of new software builds by scrutinizing 20,000 diagnostic metrics based on data flowing in from 800 million Windows devices. At the same time, the team evaluates feedback from Microsoft engineers who are using pre-release versions of Windows updates. At Microsoft, the Windows diagnostic metrics are displayed on a real-time analytics dashboard called “Release Quality View” (RQV), which helps the internal “ship-room” team assess the quality of the customer experience before each new Windows update is released. Given the importance of Windows for Microsoft’s customers, the RQV analytics dashboard is a critical tool for Windows engineers, program managers, and execs.
In my talk, I will tell how we built a geographically distributed system of personal data storage based on Open Source software and PostgreSQL. The concept of the inCountry business is to provide customers with a ready-to-use infrastructure for personal data storage. Our business customers are ensured that their customer’s personal data is securely stored within their country’s borders. We wrote an API and SDK and built a variety of services. Our system complies with generally accepted security standards (SOC Type 1, Type 2, PCI DSS, etc.). We built our infrastructure with Consul, Nomad, and Vault, used PostgreSQL, ElasticSearch as a storage system, Nginx, Jenkins, Artifactory, other tools to automate management and deployment. We have assembled our development and management teams - DevOps, Security, Monitoring, and DBA. We use both cloud providers and bare-metal servers located in different regions of the world. Development of the system architecture and ensuring the stability of the infrastructure, consistent and secure operation of all its components is the main task facing our teams.
MobilityDB is an open source moving object database system (https://github.com/ULB-CoDE-WIT/MobilityDB). Its core function is to efficiently store and query mobility tracks, such as vehicle GPS trajectories. It implements the Moving Features specification from the Open Geospatial Consortium (OGC). MobiltyDB is engineered up from PostgreSQL and PostGIS, providing spatiotemporal data management via SQL. It thus integrates with the postgreSQL eco-system allowing for complex architectures such as mobility stream processing and cloud deployments.
The presentation will explain the architecture of MobilityDB, its database types, indexes, and operations. We will highlight the PostgreSQL features that enable this extension, and the would like to have features. This presentation will be of special interest to the PostgreSQL community, and to professionals in the transportation domain.
To build a decent query plan, the optimizer has to understand statistical characteristics of underlying data. It is interesting to observe how the structure of the collected information became more complicated over time: what the optimizer relied on back in its early days and what is at his disposal now with the release of the 12th version. We will also talk about how and when statistics are collected, how to manage this process and whether it is necessary to think about it at all.
Odyssey is a new PostgreSQL connection pooler designed for the high-load environment. It can significantly scale processing performance by specifying the number of additional worker threads. This allows to increase SSL/TLS performance and provide more balanced pooling control using by sharing global server connection pools. In this talk I will answer 4 questions: 1. When you should use connection pooler? When Odyssey is the most optimal choice? 2. How to set up a cluster with Odyssey? 3. How to sleep well and be sure that everything works? 4. How does the architecture of Odyssey look like? How to implement or propose feature that you want?
Postgres has always had strong support for relational storage. However, there are many cases where relational storage is either inefficient or overly restrictive. This talk shows the many ways that Postgres has expanded to support non-relational storage, specifically the ability to store and index multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage. The talk will cover many examples of multiple-value-per-field storage, including arrays, range types, geometry, full text search, xml, json, and records.
Last year I made a talk about unexpected PostgreSQL bottlenecks, which could make sad surprise to user (or DBA). Feedback to my talk was very positive. Additionally I have new material after year. This is why I'm making a sequel including new unexpected situations when your database hangs. This time focus will be on multicore hardware platforms, but not only them.
When migrating data from one DBMS to another, the question arises: choose a third-party tool or to program the migration yourself? Companies, trying to grow competencies within themselves, choose the second option. And they come across the "invention of their own bicycles". However, the market has powerful free data migration tools. One such tool is Pentaho Data Integration, part of the Pentaho Community Edition. The report will discuss the use of this package for data migration between Oracle and PostgreSQL. Particular attention will be paid to the problems with using this tool, and to the tasks of testing for the completeness and integrity of migrated data.
Small video illustration:
Migrating engineering software from Oracle to PostgreSQL
Dmitry will talk about using an automated migrator developed by "Diasoft" company for migrating Russian systems to PostgreSQL.
Taking the LOOTSMAN engineering software (developed by the ASKON company), we will speak about the key aspects of migration, review possible issues and ways to resolve them, as well as discuss performance optimization methods for migrated stored procedures.
pg_repack is one of the most popular instruments for removing bloat of tables and indexes in Postgres. In most cases, it works perfectly. But if you use such a feature of Postgres as deferred constraints, using pg_repack becomes more difficult or even impossible. I will talk about how we encountered the problem and will describe some workarounds - from internal instruments of Postgres to a small patch for pg_repack.
Sometimes it maybe a very tricky just to establish a connection with your remote database. Especially if your working place is in highly secured corporate environment and your database server is in highly secured remote cloud.
We will talk about bypassing SSH, SSL, jump servers, proxy servers, VPNs and various SSO (Kerberos/SSPI/LDAP/Active Directory) authentication systems in order to access your PostgreSQL database using console or rich UI interface and JDBC. We will describe different use cases - from trivial and typical to insane and over-secured.
Greenplum is a horizontal scalable database based on PostgreSQL core. It is used for OLAP workloads and a standard task is to quickly load or unload large amounts of data. The external data resources are usually another distributed systems. In this talk I will show how Greenplum can work with external data. What is the architecture and performance of external and foreign tables and how to handle streaming data. What will be changed in the next version. The examples of Kafka and ClickHouse connectors development.
I'll show an example of solving the problem of searching "similar" texts for one given text in big array using GiST index. The problem itself is not much important, but it is easy to understand. Using this problem as example, I'll show one of many methods of adapting GiST index for custom search problems. Maybe this talk will help you to solve other search problems.
Tracking poor queries is an infinity quest for developers, who works with databases. Often we think that it is guilty the slow and big queries. But what to do if we do not have that kind of queries between backend and database? Which kind of queries should we looking for? What tools should be used for that work? This talk will cover all these questions.
Everyone has probably heard about such a service as AWS RDS. I will talk about my experience by using the AWS RDS PostgreSQL Engine: the positive and negative aspects of the DBA work. This talk will focus on the tools that helped me create a comfortable environment in RDS. https://www.dropbox.com/s/v7udx5x96as5gbd/pgconf2020.pdf
Many businesses which use Database management systems like Oracle, DB2 & MS SQL are unreliable these days. Moreover, the costs incurred in maintaining these systems and its product licenses keeps on increasing. As the competitors are migrating over to the new technologies and tools available in the market, it is necessary for these businesses to migrate to new environment which is efficient, consistent and reliable to stay in the market and the technologies used in the current environment have become obsolete or no longer serve the business purpose. PostgreSQL has emerged as a top open-source RDBMS software. Since there is no licensing cost associated with it most of the companies are planning to migrate the databases which are currently running on other RDBMS like Oracle, DB2, MS SQL server to PostgreSQL. This report summarizes the various methodologies, procedures and techniques involved in successfully migrating the data from Oracle to PostgreSQL & DB2 to PostgreSQL. Migration is not a simple effort there should be proper planning and testing involved in this right from database connectivity to performance analysis. In this paper we are going to cover most of the steps which we need to consider before the migration and after the migration like choosing the correct tools for implementing the migration, time taken to migrate ,data compatibility, code conversion, application connectivity to database, database configuration parameters, performance analysis, replication setups, database monitoring, patching and backup strategies.
To Eat "the Elephant" in chunks: how we made friends with MSSQL, Postgres, wrote our replication, and transferred to Postgres one of the largest MISes in Russia.
It is our experience of the medical information system "RT MIS" transfer from MSSQL to PostgreSQL . When the necessity of transfer to PostgreSQL in our "RT MIS", one of the largest medical information systems, became imminent, we felt really terrified having assessed its amount: there was a huge number of stored procedures, functions, SQL-queries in its application code and services. It all requested transcribing, was exacerbated by demands on the system accessibility. So the variant "we awoke in the morning and PostgreSQL was working everywhere" was definitely impossible. That is why we chose another way: began eating "the elephant (PostgreSQL)" in chunks.
In my report, I am going to share our practical experience of the transfer, the instruments we used, the reason for another replication, the problems we met and their solutions. And finally, what turned out to be better: PostgreSQL or MSSQL.
Everyone has heard something about transaction isolation levels, but oddly enough, almost no one can clearly explain what it is any why it is important. At the same time, for many operations, it is critical to have a clear understanding of isolation levels and how they can affect the result. Indeed, if a customer has been paid twice and the developer has to pay back the losses, it won't seem unimportant. We'll discuss how to avoid such unpleasant situations.
Porting applications from Oracle to Postgres is common work today. Unfortunately it is not without problems. In presentation I'll try to show the basic performance problems related to differences between Oracle and Postgres and PL/SQL and PL/pgSQL.
In Zabbix-server 4.4, a new Zabbix-agent has become available. It is written in Golang, has a plug-in structure, keeps a permanent connection and makes it possible to control metrics in runtime. We will make an overview of the Zabbix-agent 2 and talk about how to develop plugins for it, in particular, the PostgreSQL monitoring plugin developed by Postgres Professional in collaboration with Zabbix.
PostgreSQL built-in full text search gives unique possibilities inaccessible for external search engines, such as virtual or generated document search and search with access restrictions. I will talk about these and other features, full text search configuration, indexes, and highlight the latest advances and future expectations
Cloud storage has some unique characteristics compared to traditional storage mainly because it is virtualized and controlled by software. One example is that AWS EBS shows higher throughput with larger I/O size up to 256 KiB without hurting latency. Hence, a user can get only about 4 MiB/sec with 1,000 IOPS EBS volume if the I/O request size is 4 KiB, whereas a user can get about 250 MiB/sec if the I/O request size is 256 KiB. This is because EBS consumes one I/O in a given IOPS budget for every I/O request regardless of the I/O size (up to 256 KiB). Unfortunately, PostgreSQL cannot exploit the full potential of cloud storage because PostgreSQL has designed without considering the unique characteristics of cloud storage.
In this talk, I will introduce the AppOS extension that improves the throughput of a write-intensive workload by 10x by transparently making PostgreSQL cloud storage-native. AppOS works like a storage driver that efficiently exploits the characteristics of cloud storage, such as I/O size dependency to storage throughput and latency, atomic write support in cloud block storage, and fast, but non-durable local SSDs. To do this, AppOS comprises a Linux-compatible file I/O stack including virtual file system, page cache, block I/O layer, cloud storage driver. On top of the file I/O stack, syscall module supports registering pre- and post-handler for file I/O-related system calls in order to transparently work without modifying PostgreSQL codes.
I will focus on presenting key use cases and performance results of the AppOS extension after explaining the internals. Specifically, I will show the performance results of OLTP and some batch workloads using standard benchmarking tools like pgbench and sysbench. I will also present performance results and implications on multiple clouds including AWS, GCP, and Azure.
An enterprise-grade PostgreSQL requires many complementary technologies to the database core: high availability and automated failover, monitoring and alerting, centralized logging, connection pooling, etc. That is, a stack of components around PostgreSQL. Kubernetes has enabled a new model to deploy software abstracting away the infrastructure. However, containers are not lightweight VMs, and the packing of software paradigms that work on VMs are not valid on containers/Kubernetes. How should be PostgreSQL and its stack be deployed on Kubernetes? Enter StackGres. An open source software that is the result of re-engineering PostgreSQL to become cloud native. Join this talk to learn and see demos of how to generate PostgreSQL minimal containers; how the sidecar pattern is used (abused) to integrate PostgreSQL’s stack components, and how the networking and storage are handled. More info: stackgres.io.
High availability PostgreSQL cluster under the control of the Patroni for 1С. Single entry point is organized by Consul DNS on Windows
200 bases, several clusters, several terabytes of data Share our experience setting up and using patroni cluster DBMS Cluster on Linux, 1C server for windows. We use: PostgreSQL assembly for 1C, Patroni, Consul, Consul dns, Commvault, Ansible Vagrant file and Ansible playbook with roles attached.
Recently, I was working on a project where graphQL was used for sending requests to its .NET Core backend, but this was not a good idea. The point is, a graphQL query is a hierarchical structure with a dynamic set of fields. It’s difficult to perform such requests via a statically-typed programming language and a relational database as suggested by the tools available. So, I came up with the idea of using the plv8 extension and perform graphQL queries right on the database side. It took me about two hours to develop a working prototype that could perform the same queries as the software under development for more than one month! Then various improvements have been made and I want to introduce them all. If you are thinking of using graphQL instead of REST, my speech could be most useful and could help you to save a lot of time.
Open-source maintainers face many challenges as projects grow. How to write more required features, fix more issues and have time to watch more pull requests? On the example of WAL-G(backup tool for PostgreSQL) I will tell you about how we solved these problems by launching a course of Open-source development at Ural Federal University, what we achieved and what will we do next.
Today no one is surprised by cloud infrastructure anymore, but not all its components are easy to deploy in cloud. For example, the database is always very demanding in terms of performance and resources. Scaling and fault tolerance are the most acute problems, that's why we have been observing rapid development of alternative DBMS in the recent years. However, traditional relational DBMS have already accumulated a lot of various features, so they often remain the first choice. Besides, they are constantly evolving and offer a wide variety of scaling tools. I will mainly speak about PostgreSQL, when you should consider scaling, and how to do it right.
We will touch upon the following topics:
- Streaming replication and balancing read/write workloads
- Logical replication and data sharding
- High availability and fault tolerance
This talk should be interesting to DBAs, system administrators, team leads, infrastructure architects, as well as wider audience dealing with PostgreSQL.
Role-based access control (RBAC) is one of the main mechanisms used for access control in many DBMS, including PostgreSQL. This model is a sub-type of traditional discretionary access control with its restrictions. In addition to DAC, many operating systems also use mandatory access control (MAC) based on security labels. This additional security mechanism is obligatory for protecting information that demands higher levels of security. Naturally, we would like to use MAC within DBMS when working in OS with mandatory access control switched on.
In this talk, we'll give an overview of existing MAC implementations in DBMS, as well as share our approach to using security mechanisms provided by SELinux, the sepgsql extension for PostgreSQL, and the standard mechanism of row-level security (RLS), which has been available in PostgreSQL starting from version 9.5.
In our presentation, we will use the "Airlines" demo database provided by the Postgres Professional company to show how to protect sensitive information and personal data, compare different ways of storing security labels, and assess performance of our solution.
How I stopped worrying and migrated more than 150 PL/SQL procedures of over 60K lines of code into Postgres
I will share our experience of migrating a server application for Russian railways from Oracle 11g Standard Edition to vanilla PostgreSQL 11.5.
At the time of migration, the database contained about 200 stored procedures of over 60000 lines of Oracle PL/SQL code (which has been developed since 2006, that is, for more than 12 years), about 250 tables, and 50GB of data.
Starting with a prologue, we'll describe our adventures along the migration process, as well as pleasant and unpleasant surprises we encountered, and finally get to an epilogue and a happy end. The story is told on behalf of an Oracle user exploring Postgres.
I will speak about our experience in solving the fault-tolerance problem for PostgreSQL, which options we reviewed and how we chose Patroni.
I will cover solution testing, quick implementation to the production and issues we faced as well as the way we solved them.
Odd things in query plan analysis - wasted time and "unnecessary" buffers.
Structural hints in a plan. How to help a developer with optimization without writing a single line of code. How to match plan nodes with query text and take advantage of this information.
In the report, we want to talk about the experience of migrating one large system from Oracle to PostgreSQL. The system itself was built on the PHP + Oracle stack; its distinguishing feature was that all business logic was implemented in PL / SQL code. In a DBMS, there are more than 3000 packages with 4-10 functions (procedures) in each. In PHP - more than 10,000 forms with inserts of anonymous blocks used to receive data, process and save results in Oracle.
To solve this extremely voluminous work, we took ANTLR4 (a powerful parser generator for reading, processing, executing, or translating structured text), PL/SQL grammar, and created an automatic converter that allows you to convert all objects in the schema and our system from Oracle to working code for PostgreSQL.
In May 2019 «Postgres Professional» launched the PostgreSQL Certification Program. Now you can not only get training in our courses, but also confirm your knowledge by passing an appropriate set of tests.
In this presentation, I will share the results of the first months of the program and answer the most frequently asked questions:
Where can I get information about the certification program? PostgreSQL or Postgres Pro certification? Certificates for PostgreSQL version 10. When will you upgrade to the current version? Do I have to attend training centers? Can I pass the test remotely? Not in Moscow? Is it possible to take several tests at once in one day? How to prepare for the test? Are there any examples of training questions? During the test I got an incorrect question, what should I do? How to find out exactly which questions the answers were not correct? When can I retake after an unsuccessful attempt? How to appeal the test results?
Walk-through of extending PostgreSQL with a user-defined type. The journey begins from the basics, from creating simple domain types over existing types, and continues to implementing a full-blown datatype from scratch in C.
PostgreSQL's advanced index types, GiST, GIN, and SP-GiST, are covered in enough detail to give an understanding of what each of them is good for. Support functions for each of them are shown for the example 'color' datatype.
A brief story how MySQL → PG migration could increase company efficiency tenfold times:
- Program code has been reduced 50 times, with optimization of backend team (from 15 to 3 engineers)
- Software development of new features has become measuring in days, not in months
- Infrastructure costs per 1M users have been reduced 20 times
- Database structure and technical documentation were simplified significantly, from 100K high-dependent tables to just 20 simple tables
- New security level because of total forbidden on external SQL commands to the database
- Quick analytics aggregation on multiple parameters, without external analytics systems
- The last, but not the least: the main business was keeping alive during migration