title

text

Maksim Viharev
Maksim Viharev Alytics
: December
45 мин

GreenHouseSQL as a scalable analytics system for postgresql, greenplum and clickhouse

At pgconf’17 I talked about our analytics systems based on PostgreSQL. Afterwards we looked at hadoop, s3, presto, vertica, and other frights. Finally we stopped to suffer nonsense and just completed PostgreSQL with ready Greenplum and Clickhouse. As a result, we achieved amazing performance, fast migration, easy maintenance, reliability and horizontal scalability. We enabled to recover the system after fault in two commands, decreased infrastructure costs and expanded functionality due to ANSI SQL, MPP and In-memory. All within the open-source and full SQL paradigm. We called the product GreenHouseSQL, which is our inner whole cycle data platform. In the talk we will show the beauty of solution internals, explain the advantages and flaws, tips and tricks of starting with Greenplum, as well as why do we need Clickhouse, what is left to PostgreSQL, and eventually how does it all work.

Слайды

Видео

Другие доклады

  • Vadim Podolny
    Vadim Podolny АО "РАСУ"
    45 мин

    High Load Distributed Control Systems for Nuclear Power Plants operation

    This talk will represent a new platform of Distributed Control System for Nuclear Power Plant operation. Participants will learn about control system for very complicated automation objects. In a hard real time node more than 150 special subsystems are operating in order to control various technological processes of nuclear power plant (NPP), such as reactor control system for more than 1000 MW power unit with a turbine weighing more than 2000 tons. More than 100K of data gained from sensors are resulting in up to 500K of parameters representing 5 branches of physical processes: neutron kinetics, hydrodynamics, chemistry and radiochemistry, and physics of strength. Deviations may cause the whole system to become a huge DDoS source made of useful diagnostic information which is always much larger than the network and hardware are capable to manage. This may lead to normal operation failure. The talk will reveal the approaches to solve the issue.

    You will learn about hardware and software architecture of such systems, about backup and replication, data redundancy and technological diversity. How to manage high loads, what is QoS, and what will happen in case of normal operation system failure, as for example was at Fukushima. But, hey, there should be a talk about coding! So, no SSD and HDDs, only InMemory, data structures from tens of millions of elements, and forget about processor cache as it does not work. Imagine your newest 4-generation Xeon has lost all the advantages and turned into a "pumpkin", so let's roll up your sleeves and examine timings, synchronicity, and try to make the most of your hardware, discovering the weakest link from processor, operating system and a network.

  • Aleksander Kuzmenkov
    Aleksander Kuzmenkov Postgres Professional
    45 мин

    Towards more efficient query plans: PostgreSQL 11 and beyond

    A major responsibility of a database engine is to convert a declarative SQL query to an efficient execution plan, employing various methods to scan and join the relations. There is always a development effort to improve this area. What clever execution plans can PostgreSQL generate, what's new in version 11 and what is in development? To name a few things, the joins are optimized by removing unneeded outer and inner joins, and reducing joins from outer and semi to inner. There is work to enable merge joins on inequality and range overlap, and to improve join selectivity estimates with multi-column statistics. When it comes to scanning a single relation, covering indexes allow to use index-only scans more often. Incremental sort and more precise estimation of sorting costs help generate better paths when sorted output is required, e.g. when using GROUP BY and ORDER BY or performing merge joins. This talk aims to give an overview of such optimizations that already exist and that are being developed now.

  • Ivan Panchenko
    Ivan Panchenko Postgres Professional
    90 мин

    NoSQL/PL: Programming in non-SQL-like procedural languages

    Workshop on Server-Side development in procedural languages PL/Perl ,PL/Python, PL / v8 inside PostgreSQL and Postgres Pro. You will not only learn what they are for but also how to use them correctly and what results can be achieved using them.

  • A
    Anna Akentyeva Postgres Professional
    22 мин

    Autovacuum: what can we learn if we read the code instead of the documentation

    In this talk we will have a look at the details of autovacuum's implementation and see what kind of practical implications they have. The talk will also provide a short overview of patches for autovacuum that are currently being considered by the developer community and that may be included in newer versions of PostgreSQL.