Элегантный поиск ближайших соседей в PostgreSQL
Finding the nearest neighbor can be required for various tasks. For example, when you need to find the closest object to a given point on the map. This task looks trivial to non-programmer (a person can easily cope with it if they have a map). In a software developer's reality, this task doesn't have a common solution available to everyone. To get rid of this headache, programmers often create ad hoc solutions also known as "crutches". These workarounds don't look nice and often ruin the mood of a creative programmer who needs to go to a beer pub to cope with the cognitive dissonance :)
Indeed, while a person has a typical field of view and a map with a certain scale, the programmer has only one given point and a huge number of other points (i.e. billions of stars). This multitude of points gets a lot of incoming requests, including the write requests, not just read ones. You can write a perfect query in SQL, however, the real-world query execution plan will be depressingly long. To find the closest neighbor, you will have to read the entire table, compute all the distances from the given point and return the given number of good enough results. Indexing doesn't help in this case, as you will have to fully scan the search tree and read the entire table in random order. This will take much longer than simple table reading. In reality, tasks, where you need to efficiently find nearest neighbors, aren't limited to spatial search. It can also be used for classification tasks, finding typos, data clustering, and deduplication. All such tasks will benefit from efficient nearest neighbor search in DBMSs that are now a de facto standard for storing the data. What do we mean by "efficient search"? It means that our search is fast, concurrent, scalable, and supports various data types (most likely, non-standard ones). We implemented such KNN search in PostgreSQL 11 years ago. I will cover its implementation, today's state and share some use cases for KNN.
Слайды
Видео
Видео доступно участникам мероприятия, выполнившим вход в личный кабинет
Другие доклады
-
Andrey Zubkov Postgres Professional
Развитие модуля анализа исторической нагрузки pg_profile/pgpro_pwr
This talk is about postgres extension pg_profile - simple historic database workload profiler. I'll talk about its new features and new statistics available in Postgres 14. There is a branch of pg_profile named pgpro_pwr, designed to run in PostgresPro Enterprise Edition and PostgresPro Standard Edition databases. It is using extended performance statistics of those databases providing some valuable benefits. We'll see what's new in pgpro_pwr extension. Also I'll talk about some problems in postgres performance monitoring and their possible solutions in the future.
-
Sergey Novikov ЕДИНЫЙ ЦУПИС
Внедрение партицирования без простоя
The built-in partitioning mechanism in PostgreSQL has been developing for several years. However, there is no magic button that would transform an ordinary table into a partitioned one. I will explain how to implement partitioning in a production system, how to prepare your table and apps for partitioning, and what pitfalls can make a DBA's life harder. We will also take a deep dive into various techniques of transferring data between partitions, also considering their pros, cons and limitation.
-
Alexey Fadeev Sibedge
Plv8 Framework: разработка на plv8 в IDE, с ES6, отладкой, автотестами и деплоем
Many application developers prefer not to have logic on the DB side (in functions) as there are no suitable software development tools, especially for development teams. In the plv8 case, the situation worsens as this function contains both SQL and JavaScript code, and popular IDEs have no support for such a symbiosis. At this tutorial, I will introduce my development named "Plv8 Framework", which considerably simplifies the creation of code on plv8.
The gist is as follows: the JS code that is executed on the DB side can run on the developer's local computer by using node.js, which works on the same v8 interpreter. The plv8.execute special function is replaced by a function from the pg-native npm library, which calls the outer DBMS. I will demonstrate a set of tools chosen by me that enables the following:
- writing JS code in the IDE you prefer and enjoy syntax highlight;
- code debugging in real time (with breakpoint, watch, etc.);
- writing of auto-tests (unit-tests), with a variety of options: Postgres, SQLite, mocks;
- deploying your code in the DBMS;
- usage of additional npm packages (the issue is that all code of the plv8-based function should be included in the function's body, in one file).You can use this tool regardless of the programming language that you use for the backend. However, it becomes more flexible if you use languages with static typing (like Java, C#, etc.). For the tasks where the backend is an intermediate layer between the frontend and DBMS, logic (or its part) can be placed in plv8/js with dynamic typing, which will simplify the development process.
In addition to the development of new functions on plv8, the framework provides a set of ready-made functions for CRUD operations. These functions are universal, they aren't tied to a certain database structure, and they can work in any project. If you use them, you can do less backend development, in some projects to a very significant extent.
The installation of the plv8 extension is the most complicated part of working with it. However, I have good news: my colleagues prepared Docker files and Docker images for PostgreSQL 13 with pre-installed plv8! So your start with plv8 will be super simple: you need to deploy the Docker container using just one command!
Docker file: PostgreSQL 13 + plv8 v2.13.15
Demo project for you to participate in the tutorial
For the tutorial, you need to install the following:
Node.js (the most desirable is LTS)
IDE for JS (i.e., free Visual Studio Code)
GraphQL Playground -
Bruce Momjian EnterpriseDB
Postgres и вызовы будущего
Postgres has been a vibrant project for decades, and probably will be popular for decades to come. However, as with any complex endeavor, challenges are always lurking. This talk explores the many project, competitive, and technical challenges in the future that could derail its success. By exploring these challenges, we will be better able to avoid them.