title

text

September 25 , 2023

PGConf.SPB 2023

  • more than
    0 participants
  • 0 speakers
  • 0
    minutes of conversation
  • 24 talks
  • offline
    format

Talks

Talks archive

PGConf.SPB 2023
  • Vladimir Sitnikov
    Vladimir Sitnikov

    A talk on how regular™ indexes work in PostgreSQL. The talk will be extremely useful for those who are starting to work with databases, and for those who have worked with them but have forgotten. If the talk hits the golden picks of "must-see onboarding developer", then the goal is achieved.
    We will tackle the following subtopics:
    - How an index can speed up search;
    - Should we index WHERE conditions;
    - Should we index ORDER BY statements;
    - Should we index foreign keys;
    - What to do if search pre-assumes multiple conditions;
    - Which column order should be specified in an index;
    - Cases when index causes a slow-down and how to reduce its impact on the app.

  • Nikolai Shaplov
    Nikolai Shaplov PostgresPro

    Fuzzing research is feeding random input data to a program (or a part of it) (in fact, randomness is very conditional) and seeing what we get out of it. And we repeat it many times on many processors.

    Fuzzing a large monolithic program complex is never a simple task. It requires extraordinary solutions. In this talk, I will tell you what and how we searched with the help of fuzzing and what results it led to.

    • Investigation of data type parsing functions (input-functions): for warming up;
    • Investigation of functions implementing operations between types (op-functions): it is better to consider the structure here;
    • Network subsystem fuzzing: let's pretend we are POSIX calls, it's cheaper that way;
    • Recovering disk context: we need Groundhog Day.

    A story about funny bugs and ridiculous hand gestures will be included.

  • Владимир Комаров
    Владимир Комаров SberTech

    There are a lot of different databases. We need some formal criteria to compare databases to each other.
    The very first idea is to divide SQL and NoSQL.
    NoSQL is a popular class of platforms developed in 2000s. Indeed, the rejection of SQL is not a fresh idea because there were predecessors of the relational database model, such as network and hierarchical models.
    The fresh «NoSQL» stream consists of the graph, object, and key-value models.
    Time-series, wide column, and «document-oriented» models are just extensions of the key-value model. Their advantage is the possibility to parse either key or value on a database server.
    The facilities of SQL are much more extensive than the key-value interface. So, the simplified interface is just a charge for the ability to build a distributed database.
    So, the data model is the first axis, and the distribution is the second one.
    It’s not trivial to release a distributed relational database. The reason is that distributed transaction is one of the most complex problems in IT, and one SQL operator can involve all the nodes in a single transaction.
    There are attractive efforts to create a distributed relational database. You should pay attention to Cockroach or Yugabyte. But these platforms haven’t got widespread.
    One day a man invented the in-memory cache. As random access memory got cheaper, in-memory technologies came to databases. Every considered class of platforms contains at least one in-memory member. TimesTen and SolidDB are relational and monolithic; Tarantool, Ignite, etc. are key-value and distributed; VoltDB is relational and distributed.
    Now the storage environment becomes the third axis.
    You can remember Teradata, Greenplum, MS PDW, and a few more distributed relational platforms. They are very successful commercial software. It’s true, but these platforms are not intended to process transactions.
    So the fourth axis is the load type: OLTP vs. OLAP.
    I would like to draw a 4-dimension cube on the blackboard, but I can’t :)
    There are no clear borders between the described classes. Relational databases get some non-relational facilities, while non-relational platforms implement SQL. Disk-based systems become in-memory features, while in-memory databases learn to store data on disk. Monolithic platforms become distributed versions.
    The main idea of this presentation is the following: you have first to define the class of platforms for your solution and then choose a platform inside a class.
    Not all the classes are equal. Monolithic platforms are much more robust than distributed ones. Relational model is universal in contrast to NoSQL. On-disk storage is cheaper than in-memory.
    That’s why a relational monolithic on-disk platform is almoast always the right choice. So, choose PostgreSQL! This platform really covers more than 90% of problems.
    

  • Aleksandr Kalendaryov
    Aleksandr Kalendaryov DdataGile

    In modern data analysis, machine learning models are used as often as databases. Such IT giants as Google and Amazon have already combined them. Microsoft and Yandex are not far behind. Isn't it time to implement a machine learning model in PostgreSQL? In the report you will hear about the basics of machine learning, its implementation in databases and an example of realization as postres extension.

All talks

Partners

PGConf.SPB 2023

Informational

Partner