February 05 – 07 , 2018
PGConf.Russia 2018
PGConf.Russia 2018
PGConf.Russia is a leading Russian PostgreSQL international conference, annually taking together more than 500 PostgreSQL professionals from Russia and other countries — core and software developers, DBAs and IT-managers. The 3-day program includes training workshops presented by leading PostgreSQL experts, more than 40 talks, panel discussions and a lightning talk session.
Thems
- PostgreSQL at the cutting edge of technology: big data, internet of things, blockchain
- New features in PostgreSQL and around: PostgreSQL ecosystem development
- PostgreSQL in business software applications: system architecture, migration issues and operating experience
- Integration of PostgreSQL to 1C, GIS and other software application systems.
Talks
Talks archive
-
Oleg Bartunov PostgresPro
Nikita Glukhov PostgresProJsonb is a popular data type in PostgreSQL, it provides the web developers an ability to work with ubiquitous json inside the database and use all the power of proven relational database. Fast querying of jsonb data is a challenge for database and PostgreSQL provides several options for indexing jsonb. We present the new way of efficient indexing of jsonb, based on improvement of indexing infrastructure.
It's known, that json is a greedy data type, it may contains many auxiliary data not interesting for searching and that affects the size of index. Partial index will not helps, since it filters the rows before indexing, while we are interested in extracting of parts of jsonb. Functional indexes on specific keys could introduce too big overhead. We present an improvement of indexing infrastructure, which allows to control the index behaviour by passing parameters to operator class at index creation. For example, to index a user-defined subset of jsonb it is possible to pass to operator class the powerful path expression (either jsonpath of upcoming sql/json or jspath from jsquery extension), which can be used to extract the parts of jsonb tree. That makes index more effective and reduces the overhead of its maintaining.
Another use of parameterized operator classes is to allow a user to specify parameters instead of hard coding them, for example, the GiST signature size is currently hard coded inside the implementations of several opclasses (tsvector, hstore, intarray, pg_trgm, ltree), while it is natural to use different signature length for different data to have optimal size of index and its performance.
Full text search on parts of document can be improved by passing labels to the operator class and letting him index only specified parts of document, that allow to avoid currently used recheck of the rows returned by the index.
-
Dmitriy Sarafannikov YandexIt's not a secret for anyone that statistics can not be transferred with a major upgrade. For small and not heavily loaded databases this is not a problem, you can quickly collect new statistics. But we have databases with a volume of about 5TB and a load of about 100k rps, for which it became a big problem: taking off without statistics, the replicas could not even replay WAL. In my report I'll tell you what tricks we went to upgrade these databases with requirements of 100% read only availability, about what mistakes were made, and about how these errors were painfully corrected. The result of these errors was the extension called "pg_dirty_hands", in which we will collect various hacks, which can be last resort to repair data corruption.
-
Eren Basak Citus DataPostgres has a nice feature called Point-in-time Recovery (PITR) that would allow you to go back in time. In this talk, we will discuss what are the use-cases of PITR, how to prepare your database for PITR by setting good base backup and WAL shipping setups, with some examples. We will expand the discussion with how to achieve PITR if you have a distributed and sharded Postgres setup by mentioning challenges such as clock differences and ways to overcome them, such as two-phase commit and pg_create_restore_point.
-
Dmitriy Pavlov ArenadataIn the pitch I will talk about the most important nuances of deployment and operations of the distributed analytical open-source database based on PostgreSQL - Greenplum. I will analyze the typical mistakes in its use, give the best practices and warn about bottlenecks.
Photos
Photo archive