The first training course "DBA1. PostgreSQL Administration. Basic Course" was introduced by Postgres Professional in September 2015. The course was based on the then up-to-date version of PostgreSQL 9.4. A set of new courses has been introduced since then. In turn, the PostgreSQL database didn't stand still either.
The report will present the current line of courses for administrators and application developers. We will also show the roadmap for the near future, which includes specialist certification.
Brendan Gregg’s USE (Utilization, Saturation, Errors) method for monitoring is quite known. There’s also Tom Wilkie’s RED (Rate, Errors, Durations) method, which is suggested to be better suited to monitor services than USE. I want to talk about how we employ these methodologies when we develop our Postgres monitoring in okmeter.io.
Sometimes there is a great desire to return the database to the past, for a day or two or more days. The reasons are diverse, but most often one is to see what has changed. Or to see if the application behaved incorrectly after the update. Or it was just a command from the boss. The classic way everyone knows is to keep full backups and sets of WAL-logs to be able to recover to an arbitrary moment. This method is a real headache for DBAs/administrators, and it will not work quickly. Sure, there are some ways to optimize this process, but downtime is inevitable. PostgresPro offers a new way — database snapshots and the ability to return to them.
It's so good when database behaves predictable. When the performance is lacking, you just add CPU cores, terabytes of RAM and millions of IOPS, and everything becomes good again. But it's rather unpleasant, when server have plenty of free resources, while database is still running slow. And it's especially sad if stress testing detects no problems, while real life workload of the same volume makes your database hang.
In this talk I will consider bottlenecks of PostgreSQL, which we met in our practice, and which causes sad behavior described above. I'll also explain what can be done at user level in order to evade these bottlenecks, and what developers are planning to do in order to eliminate those bottlenecks. I'm also planning give some recipes of stress testing, which could have to evade surprises in production.
Sometimes problems arise during Postgres operation, and the faster they are identified and resolved, the happier users eventually will be. pgCenter is a set of CLI powerful utilities for troubleshooting in the "here and now" mode. In the talk I will show how to use pgCenter for efficient troubleshooting, the directions in which to search, and how to respond to certain problems, in particular how to:
- check if Postgres is in the normal state;
- identify promptly the faulty clients and stop them;
- reveal too heavy queries;
- and other tips and tricks of pgCenter.
You just set up your first PostgreSQL cluster, created a database schema, loaded some data, did some fine tuning of configuration. Now you want to make your cluster highly available. Unfortunately, PostgreSQL doesn't offer built-in automatic failover, but luckily for us, there are plenty of external tools for that. As a next logical step you start choosing a tool, and... you already doing it wrong, because first you have to define SLA, RTO, and RPO. In this talk I am going to cover most of the common mistakes people do when setting up a highly available cluster.
A major responsibility of a database engine is to convert a declarative SQL query to an efficient execution plan, employing various methods to scan and join the relations. There is always a development effort to improve this area. What clever execution plans can PostgreSQL generate, what's new in version 11 and what is in development? To name a few things, the joins are optimized by removing unneeded outer and inner joins, and reducing joins from outer and semi to inner. There is work to enable merge joins on inequality and range overlap, and to improve join selectivity estimates with multi-column statistics. When it comes to scanning a single relation, covering indexes allow to use index-only scans more often. Incremental sort and more precise estimation of sorting costs help generate better paths when sorted output is required, e.g. when using GROUP BY and ORDER BY or performing merge joins. This talk aims to give an overview of such optimizations that already exist and that are being developed now.
PostgreSQL 12 Feature Freeze is scheduled for April 2019, which didn't come yet. But general shapes of upcoming release are already visible. In this talk I'll consider patches already committed to PostgreSQL 12 as well as patches, which would be committed very likely. I'll talk with special passion about SQL/JSON, Merge, pluggable table access methods and zheap.
The dark age for PostgreSQL started at 2GIS after transitioning to the microservice architecture. Every team tried to cook database on their own — by installing instances, juggling versions, trying to code deployments with numerous tools or using manual operations. It was the right time to develop a “silver bullet” — a common set of tools to solve all the problems at once. We created our own cluster solution based on well-known PostgreSQL, repmgr, pgbouncer and Barman. Despite of the complexity of our final solution, we developed a repeatable flexible deployment to accelerate postgresql cluster deployment and management. Also we deployed the our own cluster to consolidate all databases. It helped to eliminate team efforts for database management and focus on their main goals. Failover works, we tried it :-)
This talk will represent a new platform of Distributed Control System for Nuclear Power Plant operation. Participants will learn about control system for very complicated automation objects. In a hard real time node more than 150 special subsystems are operating in order to control various technological processes of nuclear power plant (NPP), such as reactor control system for more than 1000 MW power unit with a turbine weighing more than 2000 tons. More than 100K of data gained from sensors are resulting in up to 500K of parameters representing 5 branches of physical processes: neutron kinetics, hydrodynamics, chemistry and radiochemistry, and physics of strength. Deviations may cause the whole system to become a huge DDoS source made of useful diagnostic information which is always much larger than the network and hardware are capable to manage. This may lead to normal operation failure. The talk will reveal the approaches to solve the issue.
You will learn about hardware and software architecture of such systems, about backup and replication, data redundancy and technological diversity. How to manage high loads, what is QoS, and what will happen in case of normal operation system failure, as for example was at Fukushima. But, hey, there should be a talk about coding! So, no SSD and HDDs, only InMemory, data structures from tens of millions of elements, and forget about processor cache as it does not work. Imagine your newest 4-generation Xeon has lost all the advantages and turned into a "pumpkin", so let's roll up your sleeves and examine timings, synchronicity, and try to make the most of your hardware, discovering the weakest link from processor, operating system and a network.
Meeting corporate standarts for information security, business continuity and software unification: Kerberos Authentification (Windows and Linux) in Active Directory Environment. 1C Enterprise specifics. Using backup and recovery software (HP Data Protector). Integration with corporate monitoring system (Solarwinds Mointor).
My talk will be about different cases of usage and setup of the standby server; examples how to setup standby linked to your archive( to make an opportunity to recreate standby from archive after primary crashing and promoting your old standby); Avito experience of usage of standby server for read-only queries: problems and solutions; monitoring of standby.
One of the most soft after features of Postgres v10 is logical replication. In this presentation we will cover what Logical Replication is, how it compares to Binary (Streaming Replication), how Logical Replication works, configuring Logical Replication, Logical Replication limitations, gotchas, security and management. We will also discuss potential deployed architectures with Logical and Binary Replication and some neat features of the underlying technology.
At the end of this presentation an attendee with a reasonable understanding of how to manage Postgres will be able to configure Logical replication for use.
We will be presenting MobilityDB, a PostgreSQL extension that extends the type system of PostgreSQL and PostGIS with abstract data types for representing moving object data. These types can represent the evolution on time of values of some element type, called the base type of the temporal type. For instance, temporal integers may be used to represent the evolution on time of the number of employees of a department. In this case, the data type is “temporal integer” and “integer” is the base type. Similarly, a temporal float may be used to represent the evolution on time of the temperature of a room. As another example, a temporal point may be used to represent the evolution on time of the location of a car, as reported by GPS devices. Temporal types are useful because representing values that evolve in time is essential in many applications, for example in mobility applications.
The temporal types in MobilityDB are based on the bool, int, float, and text base types provided by PostgreSQL, and on the geometry and geography base types provided by PostGIS (restricted to 2D or 3D points). MobilityDB follows the ongoing OGC standards on Moving Features (http://www.opengeospatial.org/standards/movingfeatures), and in particular the OGC Moving Features Access, which specifies operations that can be applied to time-varying geometries.
A rich set of functions and operators is available to perform various operations on temporal types. In general there are three classes:
- Lifed functions and operators: the operators on the base types (such as arithmetic operators and for integers and floats, spatial relationships and distance for geometries) are intuitively generalized when the values evolve in time. Spatiotemporal functions in MobilityDB generalize spatial functions provided by PostGIS for both "geometry" and "geography" types, for instance the "ST_Intersection". Basically, MobilityDB takes care of the temporal aspects and delegates the spatial processing to PostGIS.
- Temporal functions and operators: they process the temporal dimension of the value which can be an instant, a range, an array of instant, or an array of ranges. Examples are the atperiods function that restricts a temporal type to a given array of time ranges, and the duration function that extracts the definition time of a value.
- Spatiotemporal functions and operators: all remaining functions fall into this category. Examples are speed(tgeompoint/tgeogpoint), azimuth(tgeompoint/tgeogpoint), maxValue(tfloat/tint), twAvg(tfloat) a time weighted average, etc.
Both GiST and SP-GiST have been extended to support the temporal types. The GiST index implements an R-tree for temporal alphanumeric types and a TB-tree for temporal point types. The SP-GiST index implements a Quad-tree for temporal alphanumeric types and an Oct-tree for temporal point types. The approach used for developing SP-GIST indexes for MobilityDB allowed us to add SP-GIST indexes for 2-dimensional, 3-dimensional and n-dimensional geometries in PostGIS.
Two types of numeric aggregate functions are available. In addition to the traditional functions min, max, count, sum, and avg, there are window (also known as cumulative) versions of them. Given a time interval w, the window aggregate functions compute the value of the function at an instant t by considering the values during the interval [t − w, t]. In contrast to standard aggregation, temporal aggregation may return a result which is of a bigger size than the input. For this reason, the temporal aggregate functions have been extremely optimized in order to perform efficiently.
MobilityDB has a preliminary implementation of the statistic collector functions and the selectivity functions for the temporal types.
In terms of size, the extension is made of 67k lines of C code, 19k lines of SQL code, 67k lines of SQL unit tests. It defines 40 types, 2300 functions, and 1350 operators.
The talk will illustrate the spatiotemporal concepts and the data model of the temporal types. It will briefly describe the components of MobilityDB: indexing, aggregations, functions and operators, and the SQL interface. Query examples and uses cases will be illustrated allover the talk. The current status of MobilityDB and the planned development will also be presented.
The talk shall be given by: - Esteban Zimányi: Professor and Director of the Department of Computer and Decision Engineering of the Universite Libre de Bruxelles. - Mahmoud SAKR: Postdoc researcher in the Universite Libre de Bruxelles.
Virtualization in companies has become an alternative to the conservative "one task-one server" approach, which allows efficient use of hardware resources, centralized management of server infrastructure, saving energy and cooling resources. The report explains how to configure the VMware environment for intensive input / output PostgreSQL and profiling tools virtual infrastructure to monitor performance and resolve identified problems.
PostgreSQL + PostGIS + TimescaleDB is a ready-to-use symbiosis from a reliable RDBMS, a powerful set of geographical objects and calculations, and work with time-series data. This bundle perfectly solves the problem of storing telemetry, while leaving the whole PostgreSQL ecosystem in your hands.
Technical features of porting T-SQL code to plpgsql and data from MS SQL to PG on the example of transition ECM "Priority" to Postgres
This report focuses on the continuation of transferring our ECM “Priority” from MS SQL to Postgres. Technical solutions, issues of rewriting from T-SQL to plpgsql, optimization of the effective code and moving data will be covered. Additionally, there will be considered aspects of pgplsql performance testing to find the “bad code” of pgplsql as a candidate for optimization. The main objective of the presentation is to answer the question: "We have it in T-SQL - how to transfer it in PG?". The report is intended for junior Postgres developers and is a continuation of the previous report made at the conference in 2017(https://youtu.be/v6_4Szr8t14).
Our company has developed a software product Live Universal Interface (LUI) is a tool to quickly create and modify standardized on-screen forms to WEB browsers without compiling code, know just enough SQL. LUI is aimed at B2B, B2G, G2C and B2C segments and intended for use in billing systems, financial management, accounting and control of production, where you must decide first and foremost functional tasks, rather than demonstrate unnecessary graphics elements.
Collective elaboration is provided by storing all elements of program code in the total database, which can be located on the corporate server or in the "cloud" on a third-party site.
Reliable backup and recovery, at enterprise level for the PostgreSQL environment. No more traditional backups. CBT (Change Block Tracking) technology is the next generation incremental backup. Faster than snapshots, CBT back up blocks that change, not all of your data, reducing server and network traffic and eliminating the need for traditional backups. Benefits: • Data protection mode close to Real-Time • Update with ease
Postgres is known for it extensibility, which made it the universal database, that means it can meet the requirements of practically any project. Many extensions are well-known and widely used, for example, PostGIS extension - de-facto standard of open source GIS, hstore - an extension for storing and manipulation of arbitrary key/value pairs. I will talk about less known but useful PostgreSQL extensions, which provides a new functionaliy and/or improve the performance of PostgreSQL.
PostgreSQL was designed to be extensible, it provides an API to application developers to extend PostgreSQL functionality and/or improve the performance for specific data and workloads. It is important that there is no need of having expertise of core developers, and these new functionality could be added online without restarting of database server. Application developer can create various database objects, such as functions, data types, operators, indexes, and even new access methods.
I will present my choice of two extensions out of hundreds:
vops - greatly improves the performance of Postgres for OLAP queries using vector operations, pg_variables - provides session variables for storing scalars and relations, useful for generating reports on read-only replicas.
IIn the world of programming, the creation of source code for databases "procedural extensions" is something solitary. Most DBMS offers procedural languages and "stored procedures" to create procedural extensions. In Postgres the number of supported procedural languages has already exceeded a dozen.
Traditionally, stored procedures have many applications: it is difficult to resist the attraction to perform a data operation directly in the storage, especially in the Enterprise development. This approach quickly leads to the spreading of the business logic and dramatically increases the cost of support and development of the system as a whole.
The life cycle of stored procedures makes it difficult to use standard Change Management tools and practices. It is necessary to adapt the operations upon the stored procedures to the standards of Change Management, yet staying within the comfortable development practices.
We'll look at the typical tasks of the procedural extensions development and discuss the solutions we are implementing in the Xobot IDE.
As any ordinary software developers, we just pursued a goal to develop a system robust for high loads, and even succeeded. The system architecture was fine, but the data volume was keeping increased and revealed the painful issues and errors that nobody had expected. We faced very strange queries seemed to be unbelievable. In my short talk I would like to share sad experience of arised-from-nothing high loads in DBMS and solving the challenge.
In this session we will deep dive into the exciting features of Amazon RDS for PostgreSQL, including new versions of PostgreSQL releases, new extensions, larger instances. We will also show benchmarks of new RDS instance types, and their value proposition. We will also look at how high availability and read scaling works on RDS PostgreSQL. We will also explore lessons we have learned managing a large fleet of PostgreSQL instances, including important tunables and possible gotchas around pg_upgrade.
Software applications working on PostgreSQL is a very typical case in my practice. Some of them manage to work well, some of them do not. In the talk I will focus on errors and problems of the last ones.
As we all know, PostgreSQL is a classic vertically scalable database for OLTP loads. In parallel with PostgreSQL for many years there is its alternative horizontal-scalable MPP version of PostgreSQL, that is called Greenplum, sharpened for big data and OLAP workload. In my pitch I will show the internal architecture of Greenplum (distributed transactions, data sharding, partitioning with hybrid storage in external systems, column storage engines with compression, and much more), a comparison with the internal structure of PostgreSQL and the application areas of each solution are shown.
I'm going to talk about emerging technologies in the area of general purpose RDBMS indexing. I will describe different approaches suitable for different workloads. We will discuss ideas from academic researches and corresponding industrial response from developers, communities, and companies. There will be the short live-coding session on creating DIY index in PostgreSQL.
At pgconf’17 I talked about our analytics systems based on PostgreSQL. Afterwards we looked at hadoop, s3, presto, vertica, and other frights. Finally we stopped to suffer nonsense and just completed PostgreSQL with ready Greenplum and Clickhouse. As a result, we achieved amazing performance, fast migration, easy maintenance, reliability and horizontal scalability. We enabled to recover the system after fault in two commands, decreased infrastructure costs and expanded functionality due to ANSI SQL, MPP and In-memory. All within the open-source and full SQL paradigm. We called the product GreenHouseSQL, which is our inner whole cycle data platform. In the talk we will show the beauty of solution internals, explain the advantages and flaws, tips and tricks of starting with Greenplum, as well as why do we need Clickhouse, what is left to PostgreSQL, and eventually how does it all work.
While doing development of one of our project we were asked to build HA database using Postgres, geographically distributed.
First our choice was obvious, we started to work with big 3 cloud providers, but soon it was quite understand that everything costs big enough for us. Also there were a bunch of incompatibilities with unsupported extensions as well as londiste replication we were heavily used.
I will talk about why we chose patroni, what types of problem we faced with and patroni's special features can dramatically simplify deploy and everyday usage.
Managed database services are gaining in popularity. In this session we look at how best to configure Amazon RDS for PostgreSQL and also look at common user operations of using RDS for PostgreSQL. We will also look beyond common user operations and into some specific optimizations related to upgrade, logical replication, performance, and reducing downtime.
As is often seen in OLAP and batch processing workloads, the more complex a query (containing many joins, filters, aggregates), the more there is a possibility of row count estimation errors, which leads to planner choosing an inefficient execution plan.
To address that problem, I developed a tool called pg_plan_advsr as a PostgreSQL extension, which corrects the estimation errors by repeatedly feeding back the information collected during query execution to the planner.
The tool has three features:
- Automatic plan tuning by repeatedly feeding execution information to planner
- Preserve all plans generated during plan tuning in a history table
- Create and store optimizer hints to be able to reproduce plans generated during tuning process
I verified the effectiveness of pg_plan_advsr by enabling it when running the join order benchmark (JOB) against PG 10.4 and observed its execution time shortening to 50% of the original. Therefore, it is useful for user who would like to do plan tuning for OLAP and batch processing.
I will talk about the following things in this presentation:
- Principles behind pg_plan_advsr and its architecture
- Detailed information about the measurements done with JOB
- Possible future enhancements
- Using aqo and pg_plan_advsr together (experimental)
To migrate to a PostgreSQL/Postgres Pro we need multi-core servers to be carefully tuned for correct parallelism. What settings make multi-terabyte installations work fast and correctly?
We will share our PostgreSQL/Postgres Pro on BullSequana S and Bullion S servers testing experience.
In my talk I'll tell you about practices of working with Postgres in the Go-services. I’ll describe general advantages and disadvantages of the basic tools that are commonly used when working with Postgres using Go. Of course, we will touch on the nuances that need to be taken into account when your services are running inside the Kubernetes. I will also talk about Avito’s experience in providing a database of product’s developers. This presentation will be of interest to developers who want to avoid problems when working with Postgres, and will be useful to DBA who want to know what difficulties customers face in their database.
As an experienced Oracle DBA, I've discovered some distinctive features of PostgreSQL which I'd like to share with you. We'll review a set of essential tools for DBAs, their capabilities and usefulness in comparison to their Oracle counterparts. Also I'm going to summarize the core differences between Oracle DB and PostgreSQL in terms of administrating.
PostgreSQL provides possibility to create temporary tables. Though a temporary table is accessible only to a single session and is removed at the end of the session, all information about it is stored in the system catalogs of PostgreSQL. This is related to several issues, which make it difficult or impossible to use temporary tables in some cases. There are attempts to solve this feature, including in our company. But they have not yet succeeded, mainly because of the PostgreSQL engine. In the talk I want to tell about simple and small pg_variables extension. It allows you to create table variables along with scalar ones. I will tell how it can replace temporary tables, what advantages and disadvantages it has.
During the report, I would like to share the experience of implementing BlockChain in a real business task based on 1C+PostgreSQL. Where did this task come from? From whom do we protect data with the help of technology? How to get a chain integrity report of tens of millions of records in seconds?
- Principles of searching for troublesome queries in PostgreSQL.
- Evaluation of hypothetical indexes and their impact on query plans.
- The most common errors in 1C-programming.
- Basic methods of code refactoring, taking into account the features of PostgreSQL.
- Storing analytical information from the PostgreSQL log to assess the quality of refactoring
In this talk we will have a look at the details of autovacuum's implementation and see what kind of practical implications they have. The talk will also provide a short overview of patches for autovacuum that are currently being considered by the developer community and that may be included in newer versions of PostgreSQL.
Many DBMS specialists do not like these three letters - ORM because they have repeatedly seen the enormous queries ORM-generated for simplest operations. However practice shows that the origin of the problem is not ORM itself but rather those developers who are not able to use ORM properly. In this report I will tell you the basic principles of how to write code for ORM which generates "good" queries and also show you "bad" code samples and what you get out of them. The main idea is we have to think in SQL-style when writing the code, and so to learn to foresee what kind of query will be generated. But even having mastered that you must always check the output SQL for complex queries. I will show an example when a slight change in ORM-logic increases the volume of output SQL by dozens of times(!). I will tell you about additional tools and tricks. Namely - disabling tracking, INCLUDE construction, alternative syntax for JOIN, how to get more data using a smaller number of queries, how to effectively write queries with grouping, and what do we need mappings for. I will not bypass the cases when it is not possible to effectively solve the problem by means of ORM (for example, queries with recursion). In addition to SELECT requests, there are some Batch-Update/Delete tools that allow you to update and delete data using ORM tools without downloading data to the client side. We'll also talk on how to force the ORM to insert large volumes of data quickly via Multi-Insert and COPY. I will also discuss how ORM supports PostgreSQL-specific data types i.g. arrays, hstore and jsonb. But does it make sense to use ORM at all, since there is so much to learn? Sure it does. There are advantages of using ORM, and we will discuss them as well. All examples are based on Entity Framework technology for .Net Core and .Net Framework in C#. There are some subtle differences in ORM usage in Hibernate/NHibernate, but the basic principles remain the same, so the report will be useful for developers using various technologies.
My presentation is about writing extensions in Postgres. I have written pg_threads that implements simplified POSIX thread API inside Postgres database. It adds a new powerful abstraction giving database developers new opportunities for writing parallel code thus taking advantage of multicore CPUs. There is an extra API for transactional and non-transactional IPC between threads. I also have an example application that takes advantage of this new API that scales linearly even across 2 nodes. The presentation is with live working demo using vagrant project with 2 VMs running Ubuntu and 2 Postgres 11 databases.
Declarative partitioning was a long-awaited feature and has been enhanced since its introduction in PostgreSQL 10. However, for many users, finding optimal partitioning schemes to have the best benefits from partitioning is not an easy task. Therefore, we added in HypoPG a new hypothetical partitioning feature which helps users to design partitioning. In this presentation, I will provide a brief introduction of HypoPG and explain declarative partitioning, and then I'll show the usage of hypothetical partitioning feature and explain how the extension is working.
Shared_buffers = 25% – is it too much or not enough? Or it's the right value?
How can we ensure that this – pretty much outdated – recommendation suit well our needs?
It is time to start apply enterprise-level approach to tuning postgresql.conf. Not using various blind auto-tuners or advices from old articles and blog posts, but based on the following two aspects:
- comprehensive database experiments, conducted in automated fashion, repeated multiple times in conditions as close to production as possible, and
- deep understanding of DBMS and OS internals.
Using Nancy CLI (https://gitlab.com/postgres.ai/nancy) we will consider a concrete example: infamous shared_buffers, under various circumstances, in various projects. We will try to figure out, how to optimize this settings for given infrastructure, database, and workload.
This case study walks participants through a case where we decided to embark on a data recovery effort. This talk is applicable to all users, from novices to advanced PostgreSQL database administrators. Beginners will get an understanding of what data recovery is and is not, what expectations to have going into it, and how to work with contracted experts in order to ensure the best possible outcome, while more advanced users and experts will also get a fair bit out of the technical aspects of the case study.
While the talk will emphasize non-technical operational aspects of data recovery, it will also include discussions of the internals of PostgreSQL we had to work with, as well as how we went about approaching difficulties so that we could retrieve the data we hoped to.
So you finally have your database model for your application and you fill it in with current data. How do you keep it up to date? While INSERT may still be transparent, UPDATE and DELETE will overwrite your previous data, so you won't be able to reproduce them. Cloning the whole huge content for each minor update is not an option. For rich and complex data about hundreds of thousands of power generators in Germany and worldwide, I built a model using range data types in recent PostgreSQL which allows me to insert, update and delete data while granting the full access to the whole state of the database at any historical moment. I'll present a very simplified version of the database so the audience will be immediately able to apply it for their cases. I'll also show a few tricks in Python and Psycopg2 that will allow a whole team to prepare, review, and deploy all revisions to this database without merge conflicts. And I'll give a few ideas on how to retrieve this data efficiently.