title

text

Nikita Glukhov
Nikita Glukhov Postgres Professional
Oleg Bartunov
Oleg Bartunov Postgres Professional
15:45 26 October
45 мин

Elegant K-nearest neighbor (KNN) searching in PostgreSQL

Finding the nearest neighbor can be required for various tasks. For example, when you need to find the closest object to a given point on the map. This task looks trivial to non-programmer (a person can easily cope with it if they have a map). In a software developer's reality, this task doesn't have a common solution available to everyone. To get rid of this headache, programmers often create ad hoc solutions also known as "crutches". These workarounds don't look nice and often ruin the mood of a creative programmer who needs to go to a beer pub to cope with the cognitive dissonance :)

Indeed, while a person has a typical field of view and a map with a certain scale, the programmer has only one given point and a huge number of other points (i.e. billions of stars). This multitude of points gets a lot of incoming requests, including the write requests, not just read ones. You can write a perfect query in SQL, however, the real-world query execution plan will be depressingly long. To find the closest neighbor, you will have to read the entire table, compute all the distances from the given point and return the given number of good enough results. Indexing doesn't help in this case, as you will have to fully scan the search tree and read the entire table in random order. This will take much longer than simple table reading. In reality, tasks, where you need to efficiently find nearest neighbors, aren't limited to spatial search. It can also be used for classification tasks, finding typos, data clustering, and deduplication. All such tasks will benefit from efficient nearest neighbor search in DBMSs that are now a de facto standard for storing the data. What do we mean by "efficient search"? It means that our search is fast, concurrent, scalable, and supports various data types (most likely, non-standard ones). We implemented such KNN search in PostgreSQL 11 years ago. I will cover its implementation, today's state and share some use cases for KNN.

Слайды

Видео

Видео доступно участникам мероприятия, выполнившим вход в личный кабинет

Другие доклады

  • Alexey Sabanov
    Alexey Sabanov АО "Аладдин Р.Д."
    A
    Alexander Dodokhov АО "Аладдин Р.Д."
    22 мин

    Ensuring data security during migration and compliance with import substitution regulations

    Multiple data leaks can occur during data migration. How to protect your sensitive data and stay compliant with the requirements of regulatory institutions? This is what you will learn from my presentation.

  • Sergey Rider
    Sergey Rider DBeaver Corp
    Tatiana Krupenya
    Tatiana Krupenya DBeaver Corp
    22 мин

    How to speed up data load up to 10,000 times?

    What can be more important in the data load process than speed? Data migration is one of the most requested features in DBeaver. So the performance issue was highly important for us, in regard to PostgreSQL, as well as Greenplum, Redshift and other Postgres-based databases. We are ready to share our tiny secrets about 10x, 100x, 1000x, and even 10,000x performance improvements for data imports without any magic.

  • Nikita Levchenko
    Nikita Levchenko ПАО «Ростелеком»
    Yuriy Plotnikov
    Yuriy Plotnikov ООО «РТК ИТ»
    45 мин

    From Oracle to Postgres using Russian rails. Non-obvious nuances

    We'll discuss our approaches to picking technical solutions for the systems we design. We'll consider their advantages and limitations. We'll share the facts about the changes in our engineering culture when import substitution requirements came into effect. We migrated the system with a classical three-tier architecture from Oracle to Postgres Pro DBMS. In my presentation I'll pay attention to data migration tools and peculiarities of development Java/Kotlin apps and SQL interoperability between two DBMSs.

  • Alexandra Kuznetsova
    Alexandra Kuznetsova Postgres Professional
    22 мин

    Mamonsu 3.0: New Features

    Mamonsu is an active monitoring agent for PostgreSQL based on Zabbix. Mamonsu 3.0 Major Update includes new metrics and more possibilities for data visualization via Zabbix screens. It also introduces the improved functionality of Mamonsu tools for interacting with the DBMS. In addition to the new functionality overview, I will also share the nuances of upgrading to the 3.0 version.