Talks and tutorials
I would like to share my experience in optimization queries in PostgreSQL of RMIS (Regional Medical Information System).
I'm going to talk about emerging technologies in the area of general purpose RDBMS indexing. I will describe different approaches suitable for different workloads. We will discuss ideas from academic researches and corresponding industrial response from developers, communities, and companies. There will be the short live-coding session on creating DIY index in PostgreSQL.
This talk will contain 3 parts: 1. Express PITR setup to the Cloud 2. Latest advancements in WAL-G for backups 3. Why you may need or should avoid this new features, depending on your specifications and workload.
Declarative partitioning was a long-awaited feature and has been enhanced since its introduction in PostgreSQL 10. However, for many users, finding optimal partitioning schemes to have the best benefits from partitioning is not an easy task. Therefore, we added in HypoPG a new hypothetical partitioning feature which helps users to design partitioning. In this presentation, I will provide a brief introduction of HypoPG and explain declarative partitioning, and then I'll show the usage of hypothetical partitioning feature and explain how the extension is working.
You just set up your first PostgreSQL cluster, created a database schema, loaded some data, did some fine tuning of configuration. Now you want to make your cluster highly available. Unfortunately, PostgreSQL doesn't offer built-in automatic failover, but luckily for us, there are plenty of external tools for that. As a next logical step you start choosing a tool, and... you already doing it wrong, because first you have to define SLA, RTO, and RPO. In this talk I am going to cover most of the common mistakes people do when setting up a highly available cluster.
My talk will be about different cases of usage and setup of the standby server; examples how to setup standby linked to your archive( to make an opportunity to recreate standby from archive after primary crashing and promoting your old standby); Avito experience of usage of standby server for read-only queries: problems and solutions; monitoring of standby.
One of the most soft after features of Postgres v10 is logical replication. In this presentation we will cover what Logical Replication is, how it compares to Binary (Streaming Replication), how Logical Replication works, configuring Logical Replication, Logical Replication limitations, gotchas, security and management. We will also discuss potential deployed architectures with Logical and Binary Replication and some neat features of the underlying technology.
At the end of this presentation an attendee with a reasonable understanding of how to manage Postgres will be able to configure Logical replication for use.
Virtualization in companies has become an alternative to the conservative "one task-one server" approach, which allows efficient use of hardware resources, centralized management of server infrastructure, saving energy and cooling resources. The report explains how to configure the VMware environment for intensive input / output PostgreSQL and profiling tools virtual infrastructure to monitor performance and resolve identified problems.
We will be presenting MobilityDB, a PostgreSQL extension that extends the type system of PostgreSQL and PostGIS with abstract data types for representing moving object data. These types can represent the evolution on time of values of some element type, called the base type of the temporal type. For instance, temporal integers may be used to represent the evolution on time of the number of employees of a department. In this case, the data type is “temporal integer” and “integer” is the base type. Similarly, a temporal float may be used to represent the evolution on time of the temperature of a room. As another example, a temporal point may be used to represent the evolution on time of the location of a car, as reported by GPS devices. Temporal types are useful because representing values that evolve in time is essential in many applications, for example in mobility applications.
The temporal types in MobilityDB are based on the bool, int, float, and text base types provided by PostgreSQL, and on the geometry and geography base types provided by PostGIS (restricted to 2D or 3D points). MobilityDB follows the ongoing OGC standards on Moving Features (http://www.opengeospatial.org/standards/movingfeatures), and in particular the OGC Moving Features Access, which specifies operations that can be applied to time-varying geometries.
A rich set of functions and operators is available to perform various operations on temporal types. In general there are three classes:
- Lifed functions and operators: the operators on the base types (such as arithmetic operators and for integers and floats, spatial relationships and distance for geometries) are intuitively generalized when the values evolve in time. Spatiotemporal functions in MobilityDB generalize spatial functions provided by PostGIS for both "geometry" and "geography" types, for instance the "ST_Intersection". Basically, MobilityDB takes care of the temporal aspects and delegates the spatial processing to PostGIS.
- Temporal functions and operators: they process the temporal dimension of the value which can be an instant, a range, an array of instant, or an array of ranges. Examples are the atperiods function that restricts a temporal type to a given array of time ranges, and the duration function that extracts the definition time of a value.
- Spatiotemporal functions and operators: all remaining functions fall into this category. Examples are speed(tgeompoint/tgeogpoint), azimuth(tgeompoint/tgeogpoint), maxValue(tfloat/tint), twAvg(tfloat) a time weighted average, etc.
Both GiST and SP-GiST have been extended to support the temporal types. The GiST index implements an R-tree for temporal alphanumeric types and a TB-tree for temporal point types. The SP-GiST index implements a Quad-tree for temporal alphanumeric types and an Oct-tree for temporal point types. The approach used for developing SP-GIST indexes for MobilityDB allowed us to add SP-GIST indexes for 2-dimensional, 3-dimensional and n-dimensional geometries in PostGIS.
Two types of numeric aggregate functions are available. In addition to the traditional functions min, max, count, sum, and avg, there are window (also known as cumulative) versions of them. Given a time interval w, the window aggregate functions compute the value of the function at an instant t by considering the values during the interval [t − w, t]. In contrast to standard aggregation, temporal aggregation may return a result which is of a bigger size than the input. For this reason, the temporal aggregate functions have been extremely optimized in order to perform efficiently.
MobilityDB has a preliminary implementation of the statistic collector functions and the selectivity functions for the temporal types.
In terms of size, the extension is made of 67k lines of C code, 19k lines of SQL code, 67k lines of SQL unit tests. It defines 40 types, 2300 functions, and 1350 operators.
The talk will illustrate the spatiotemporal concepts and the data model of the temporal types. It will briefly describe the components of MobilityDB: indexing, aggregations, functions and operators, and the SQL interface. Query examples and uses cases will be illustrated allover the talk. The current status of MobilityDB and the planned development will also be presented.
The talk shall be given by: - Esteban Zimányi: Professor and Director of the Department of Computer and Decision Engineering of the Universite Libre de Bruxelles. - Mahmoud SAKR: Postdoc researcher in the Universite Libre de Bruxelles.
The dark age for PostgreSQL started at 2GIS after transitioning to the microservice architecture. Every team tried to cook database on their own — by installing instances, juggling versions, trying to code deployments with numerous tools or using manual operations. It was the right time to develop a “silver bullet” — a common set of tools to solve all the problems at once. We created our own cluster solution based on well-known PostgreSQL, repmgr, pgbouncer and Barman. Despite of the complexity of our final solution, we developed a repeatable flexible deployment to accelerate postgresql cluster deployment and management. Also we deployed the our own cluster to consolidate all databases. It helped to eliminate team efforts for database management and focus on their main goals. Failover works, we tried it :-)
PostGIS is spatial extension for PostgreSQL.
This talk will go in depth on using PostGIS for disaster management: which functions can be used and for what.
Postgres is known for it extensibility, which made it the universal database, that means it can meet the requirements of practically any project. Many extensions are well-known and widely used, for example, PostGIS extension - de-facto standard of open source GIS, hstore - an extension for storing and manipulation of arbitrary key/value pairs. I will talk about less known but useful PostgreSQL extensions, which provides a new functionaliy and/or improve the performance of PostgreSQL.
PostgreSQL was designed to be extensible, it provides an API to application developers to extend PostgreSQL functionality and/or improve the performance for specific data and workloads. It is important that there is no need of having expertise of core developers, and these new functionality could be added online without restarting of database server. Application developer can create various database objects, such as functions, data types, operators, indexes, and even new access methods.
I will present my choice of three extensions out of hundreds: ltree - provides tree operations with indexing support on hierarchical data, vops - greatly improves the performance of Postgres for OLAP queries using vector operations, pg_variables - provides session variables for storing scalars and relations, useful for generating reports on read-only replicas.
Brendan Gregg’s USE (Utilization, Saturation, Errors) method for monitoring is quite known. There’s also Tom Wilkie’s RED (Rate, Errors, Durations) method, which is suggested to be better suited to monitor services than USE. I want to talk about how we employ these methodologies when we develop our Postgres monitoring in okmeter.io.
Wide usage of Common Table Expression queries considered as a core paradigm for implementing the Business Logic for high loaded web applications development based on PostgreSQL functions.
Reliable backup and recovery, at enterprise level for the PostgreSQL environment. No more traditional backups. CBT (Change Block Tracking) technology is the next generation incremental backup. Faster than snapshots, CBT back up blocks that change, not all of your data, reducing server and network traffic and eliminating the need for traditional backups. Benefits: • Data protection mode close to Real-Time • Update with ease
As an experienced Oracle DBA, I've discovered some distinctive features of PostgreSQL which I'd like to share with you. We'll review a set of essential tools for DBAs, their capabilities and usefulness in comparison to their Oracle counterparts. Also I'm going to summarize the core differences between Oracle DB and PostgreSQL in terms of administrating.
DBMS inside container is a nightmare for DBA. I will describe how we use containerized Postgres in Juno. What problems we faced with and how did we solve them.
PostgreSQL + PostGIS + TimescaleDB is a ready-to-use symbiosis from a reliable RDBMS, a powerful set of geographical objects and calculations, and work with time-series data. This bundle perfectly solves the problem of storing telemetry, while leaving the whole PostgreSQL ecosystem in your hands.
Meeting corporate standarts for information security, business continuity and software unification: Kerberos Authentification (Windows and Linux) in Active Directory Environment. 1C Enterprise specifics. Using backup and recovery software (HP Data Protector). Integration with corporate monitoring system (Solarwinds Mointor).
So you finally have your database model for your application and you fill it in with current data. How do you keep it up to date? While INSERT may still be transparent, UPDATE and DELETE will overwrite your previous data, so you won't be able to reproduce them. Cloning the whole huge content for each minor update is not an option. For rich and complex data about hundreds of thousands of power generators in Germany and worldwide, I built a model using range data types in recent PostgreSQL which allows me to insert, update and delete data while granting the full access to the whole state of the database at any historical moment. I'll present a very simplified version of the database so the audience will be immediately able to apply it for their cases. I'll also show a few tricks in Python and Psycopg2 that will allow a whole team to prepare, review, and deploy all revisions to this database without merge conflicts. And I'll give a few ideas on how to retrieve this data efficiently.
As any ordinary software developers, we just pursued a goal to develop a system robust for high loads, and even succeeded. The system architecture was fine, but the data volume was keeping increased and revealed the painful issues and errors that nobody had expected. We faced very strange queries seemed to be unbelievable. In my short talk I would like to share sad experience of arised-from-nothing high loads in DBMS and solving the challenge.
At pgconf’17 I talked about our analytics systems based on PostgreSQL. Afterwards we looked at hadoop, s3, presto, vertica, and other frights. Finally we stopped to suffer nonsense and just completed PostgreSQL with ready Greenplum and Clickhouse. As a result, we achieved amazing performance, fast migration, easy maintenance, reliability and horizontal scalability. We enabled to recover the system after fault in two commands, decreased infrastructure costs and expanded functionality due to ANSI SQL, MPP and In-memory. All within the open-source and full SQL paradigm. We called the product GreenHouseSQL, which is our inner whole cycle data platform. In the talk we will show the beauty of solution internals, explain the advantages and flaws, tips and tricks of starting with Greenplum, as well as why do we need Clickhouse, what is left to PostgreSQL, and eventually how does it all work.
This case study walks participants through a case where we decided to embark on a data recovery effort. This talk is applicable to all users, from novices to advanced PostgreSQL database administrators. Beginners will get an understanding of what data recovery is and is not, what expectations to have going into it, and how to work with contracted experts in order to ensure the best possible outcome, while more advanced users and experts will also get a fair bit out of the technical aspects of the case study.
While the talk will emphasize non-technical operational aspects of data recovery, it will also include discussions of the internals of PostgreSQL we had to work with, as well as how we went about approaching difficulties so that we could retrieve the data we hoped to.
PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. The default configuration is designed in such a way that PostgreSQL can run on any system using minimum resources. Consequently, a default installation of PostgreSQL does not give optimum performance on high performance machine because it is set up to use all available resources. PostgreSQL provides mechanisms that allow you to tune your database according to your workload and machine specification. Outside of PostgreSQL, though, we can tune the Linux kernel to allow the database load to work optimally. In this talk, we will learn how to tune some of the PostgreSQL’s parameters, and we will see the effect of that tuning, but we will focus on demonstrating how to tune Linux for better postgres performance. As there are so many Linux kernel parameters that can be tuned to improve the performance of PostgreSQL, I will also share the results of benchmarks obtained when tuning some of the Linux parameters.
As we all know, PostgreSQL is a classic vertically scalable database for OLTP loads. In parallel with PostgreSQL for many years there is its alternative horizontal-scalable MPP version of PostgreSQL, that is called Greenplum, sharpened for big data and OLAP workload. In my pitch I will show the internal architecture of Greenplum (distributed transactions, data sharding, partitioning with hybrid storage in external systems, column storage engines with compression, and much more), a comparison with the internal structure of PostgreSQL and the application areas of each solution are shown.
In this tutorial we will discuss Binary and Logical replication in a practitioner format. The topics that will be included are native Postgres replication technologies, configuring and managing them. We will also discuss performance and draw backs of various architectures (sync vs async etc...). At the end of this presentation the attendees will be able to configure a basic replication deployment with HOT Standby and well as have an understanding of other technologies such as Point in Time Recovery and cascading replication.
When you are optimizing Postgres it is usually maintenance that goes by the wayside. How do we fix autovacuum? Where did all of this bloat come from? Why am I getting IO spikes? How do I get RDS to behave?! Why are commits so slow on replication? The answer to all of these questions is understanding the relationship between proper Postgres maintenance and performance. Join us for a 3 hour jaunt through the wily world of making Postgres Go!
PostgreSQL provides possibility to create temporary tables. Though a temporary table is accessible only to a single session and is removed at the end of the session, all information about it is stored in the system catalogs of PostgreSQL. This is related to several issues, which make it difficult or impossible to use temporary tables in some cases. There are attempts to solve this feature, including in our company. But they have not yet succeeded, mainly because of the PostgreSQL engine. In the talk I want to tell about simple and small pg_variables extension. It allows you to create table variables along with scalar ones. I will tell how it can replace temporary tables, what advantages and disadvantages it has.
My presentation is about writing extensions in Postgres. I have written pg_threads that implements simplified POSIX thread API inside Postgres database. It adds a new powerful abstraction giving database developers new opportunities for writing parallel code thus taking advantage of multicore CPUs. There is an extra API for transactional and non-transactional IPC between threads. I also have an example application that takes advantage of this new API that scales linearly even across 2 nodes. The presentation is with live working demo using vagrant project with 2 VMs running Ubuntu and 2 Postgres 11 databases.
Technical features of porting T-SQL code to plpgsql on the example of transfering ECM "Priority" to PostgresTalk
This report focuses on the continuation of transferring our ECM “Priority” from MS SQL to Postgres. Technical solutions, issues of rewriting from T-SQL to plpgsql and optimization of the effective code will be covered. Additionally, there will be considered aspects of pgplsql performance testing to find the “bad code” of pgplsql as a candidate for optimization. The main objective of the presentation is to answer the question: "We have it in T-SQL - how to do it in PG?"
Managed database services are gaining in popularity. In this session we look at how best to configure Amazon RDS for PostgreSQL and also look at common user operations of using RDS for PostgreSQL. We will also look beyond common user operations and into some specific optimizations related to upgrade, logical replication, performance, and reducing downtime.
Sometimes problems arise during Postgres operation, and the faster they are identified and resolved, the happier users eventually will be. pgCenter is a set of CLI powerful utilities for troubleshooting in the "here and now" mode. In the talk I will show how to use pgCenter for efficient troubleshooting, the directions in which to search, and how to respond to certain problems, in particular how to:
- check if Postgres is in the normal state;
- identify promptly the faulty clients and stop them;
- reveal too heavy queries;
- and other tips and tricks of pgCenter.
During the report, I would like to share the experience of implementing BlockChain in a real business task based on 1C+PostgreSQL. Where did this task come from? From whom do we protect data with the help of technology? How to get a chain integrity report of tens of millions of records in seconds?
In this session we will deep dive into the exciting features of Amazon RDS for PostgreSQL, including new versions of PostgreSQL releases, new extensions, larger instances. We will also show benchmarks of new RDS instance types, and their value proposition. We will also look at how high availability and read scaling works on RDS PostgreSQL. We will also explore lessons we have learned managing a large fleet of PostgreSQL instances, including important tunables and possible gotchas around pg_upgrade.
Our company has developed a software product Live Universal Interface (LUI) is a tool to quickly create and modify standardized on-screen forms to WEB browsers without compiling code, know just enough SQL. LUI is aimed at B2B, B2G, G2C and B2C segments and intended for use in billing systems, financial management, accounting and control of production, where you must decide first and foremost functional tasks, rather than demonstrate unnecessary graphics elements.
Collective elaboration is provided by storing all elements of program code in the total database, which can be located on the corporate server or in the "cloud" on a third-party site.
While doing development of one of our project we were asked to build HA database using Postgres, geographically distributed.
First our choice was obvious, we started to work with big 3 cloud providers, but soon it was quite understand that everything costs big enough for us. Also there were a bunch of incompatibilities with unsupported extensions as well as londiste replication we were heavily used.
I will talk about why we chose patroni, what types of problem we faced with and patroni's special features can dramatically simplify deploy and everyday usage.
Shared_buffers = 25% – is it too much or not enough? Or it's the right value?
How can we ensure that this – pretty much outdated – recommendation suit well our needs?
It is time to start apply enterprise-level approach to tuning postgresql.conf. Not using various blind auto-tuners or advices from old articles and blog posts, but based on the following two aspects:
- comprehensive database experiments, conducted in automated fashion, repeated multiple times in conditions as close to production as possible, and
- deep understanding of DBMS and OS internals.
Using Nancy CLI (https://gitlab.com/postgres.ai/nancy) we will consider a concrete example: infamous shared_buffers, under various circumstances, in various projects. We will try to figure out, how to optimize this settings for given infrastructure, database, and workload.
This talk will represent a new platform of Distributed Control System for Nuclear Power Plant operation. Participants will learn about control system for very complicated automation objects. In a hard real time node more than 150 special subsystems are operating in order to control various technological processes of nuclear power plant (NPP), such as reactor control system for more than 1000 MW power unit with a turbine weighing more than 2000 tons. More than 100K of data gained from sensors are resulting in up to 500K of parameters representing 5 branches of physical processes: neutron kinetics, hydrodynamics, chemistry and radiochemistry, and physics of strength. Deviations may cause the whole system to become a huge DDoS source made of useful diagnostic information which is always much larger than the network and hardware are capable to manage. This may lead to normal operation failure. The talk will reveal the approaches to solve the issue.
You will learn about hardware and software architecture of such systems, about backup and replication, data redundancy and technological diversity. How to manage high loads, what is QoS, and what will happen in case of normal operation system failure, as for example was at Fukushima. But, hey, there should be a talk about coding! So, no SSD and HDDs, only InMemory, data structures from tens of millions of elements, and forget about processor cache as it does not work. Imagine your newest 4-generation Xeon has lost all the advantages and turned into a "pumpkin", so let's roll up your sleeves and examine timings, synchronicity, and try to make the most of your hardware, discovering the weakest link from processor, operating system and a network.
Software applications working on PostgreSQL is a very typical case in my practice. Some of them manage to work well, some of them do not. In the talk I will focus on errors and problems of the last ones.
Python may not be the fastest programming language on the CPU, but its fast and easy development saves a lot of costs between the keyboard and the chair. In this workshop we'll explore the most popular library for the access to the PostgreSQL server named "psycopg2" and write a short application using some of its useful features.
In this master class, I will take you step-by step through a major upgrade of PostgreSQL. Through our practice we see a lot of different PostgreSQL servers in production environment and often, almost too often teams who once made PostgreSQL their database of choice never update it following the initial installation. The are many different reasons for it, the result, however, is the same - they all miss out on the new useful features of the newer releases and reduced database performance.
The goal of my masterclass is to equip attendees with necessary tools for performing PostgreSQL upgrade. I will take you through each step of the major upgrade and will dive into each executed command. I will also explain the particular order in which I perform an upgrade and explain the consequences of not following this order or missing a particular step. We will perform an upgrade of PostgreSQL 9.0 to 11. My hope is that following this masterclass number of outdated PostgreSQL database will reduce since participants will then go back to their databases and make sure that they are running the most recent version.
Python may not be the fastest programming language on the CPU, but its fast and easy development saves a lot of costs between the keyboard and the chair. Since database clients spend most of their time waiting for a response from the database server, Python's asynchronous functionality available in the recent versions (3.5+) may help to optimize the application's runtime considerably by working on something else while server's response is being prepared. The asynchronous interface between Python and PostgreSQL is called "asyncpg". In the workshop we'll explore this library and write a short application using some of its useful features.
Patroni is getting art of state standard framework for building HA clusters with postgres now.
During session we will build simple 3 node cluster using mentioned stack.
We will discuss patroni's architecture, and most interesting parameters from it's configuration. We will check how actually failover works and how could you initialise cluster.
After session you will be able to built such cluster from scratch in minutes using given ansible playbooks.