title

text

Anton Doroshkevich
Anton Doroshkevich ИнфоСофт
10:15 25 October
45 мин

Unequal battle with "bloating" in the realities of 1C databases

Data is getting bigger, disks are getting faster, the DBMS optimizer is getting smarter, but the problem of " bloating " remains extremely relevant. I want to share my experience and approach to dealing with this effect on a large number of tables and data in them.

Слайды

Дорошкевич Антон - Неравный бой с распуханием в реалиях баз 1С.pptx

Другие доклады

  • Иван Чувашов
    Иван Чувашов ООО Calltouch
    45 мин

    Several production cases during one night

    We maintain large PostgreSQL databases with the total data volume exceeding 180 TB. Every instance in the cluster gets a workload of 15 thousand requests per second. These circumstances certainly impose limitations on the classical approach to updating database structures. The cost of a DBA error gets higher, as even a small error can lead to a sleepless night) In my talk, I will cover our existing limitations for changes in the production environment, non-classical DBMS behavior under heavy workloads, and will discuss PostgreSQL in general.

  • Ekaterina Sokolova
    Ekaterina Sokolova Postgres Professional
    22 мин

    PG_QUERY_STATE: Everything about the inner life of your queries

    When working with PostgreSQL, we regularly encounter queries that take suspiciously long to be executed. What if it's too late to enable statistics or auto explain, and cancellation of the query doesn't look sensible, but you still need to find out what caused the issue or see at what stage execution the query is? pg_query_state is an open source PostgreSQL extension, which can give you access to the query execution plan in real time. It also makes available statistics for each node, including the number of processed rows.

  • Ruslan Usmanov
    Ruslan Usmanov Федеральное казначейство
    45 мин

    A subsystem of Russia's "Electronic Budget" on fully domestic software

    Creation of a completely import-substituted component of the state information system using the example of the PUR KS - the "Electronic Budget" subsystem, the key state integrated information system for managing public finances. The presentation presents the subsystem architecture, performance optimization history and monitoring description. The speaker will talk about the advantages and difficulties of using an import-substituted system for the public sector, as well as about the nuances that departments should take into account when implementing open source solutions and solutions from Russian developers.

  • Oleg Bartunov
    Oleg Bartunov Postgres Professional
    Nikita Glukhov
    Nikita Glukhov Postgres Professional
    45 мин

    Elegant K-nearest neighbor (KNN) searching in PostgreSQL

    Finding the nearest neighbor can be required for various tasks. For example, when you need to find the closest object to a given point on the map. This task looks trivial to non-programmer (a person can easily cope with it if they have a map). In a software developer's reality, this task doesn't have a common solution available to everyone. To get rid of this headache, programmers often create ad hoc solutions also known as "crutches". These workarounds don't look nice and often ruin the mood of a creative programmer who needs to go to a beer pub to cope with the cognitive dissonance :)

    Indeed, while a person has a typical field of view and a map with a certain scale, the programmer has only one given point and a huge number of other points (i.e. billions of stars). This multitude of points gets a lot of incoming requests, including the write requests, not just read ones. You can write a perfect query in SQL, however, the real-world query execution plan will be depressingly long. To find the closest neighbor, you will have to read the entire table, compute all the distances from the given point and return the given number of good enough results. Indexing doesn't help in this case, as you will have to fully scan the search tree and read the entire table in random order. This will take much longer than simple table reading. In reality, tasks, where you need to efficiently find nearest neighbors, aren't limited to spatial search. It can also be used for classification tasks, finding typos, data clustering, and deduplication. All such tasks will benefit from efficient nearest neighbor search in DBMSs that are now a de facto standard for storing the data. What do we mean by "efficient search"? It means that our search is fast, concurrent, scalable, and supports various data types (most likely, non-standard ones). We implemented such KNN search in PostgreSQL 11 years ago. I will cover its implementation, today's state and share some use cases for KNN.