
Владимир Комаров
SberTech
13:00
26 September
45 min
The hitchhikers' guide to the databases
There are a lot of different databases. We need some formal criteria to compare databases to each other.
The very first idea is to divide SQL and NoSQL.
NoSQL is a popular class of platforms developed in 2000s. Indeed, the rejection of SQL is not a fresh idea because there were predecessors of the relational database model, such as network and hierarchical models.
The fresh «NoSQL» stream consists of the graph, object, and key-value models.
Time-series, wide column, and «document-oriented» models are just extensions of the key-value model. Their advantage is the possibility to parse either key or value on a database server.
The facilities of SQL are much more extensive than the key-value interface. So, the simplified interface is just a charge for the ability to build a distributed database.
So, the data model is the first axis, and the distribution is the second one.
It’s not trivial to release a distributed relational database. The reason is that distributed transaction is one of the most complex problems in IT, and one SQL operator can involve all the nodes in a single transaction.
There are attractive efforts to create a distributed relational database. You should pay attention to Cockroach or Yugabyte. But these platforms haven’t got widespread.
One day a man invented the in-memory cache. As random access memory got cheaper, in-memory technologies came to databases. Every considered class of platforms contains at least one in-memory member. TimesTen and SolidDB are relational and monolithic; Tarantool, Ignite, etc. are key-value and distributed; VoltDB is relational and distributed.
Now the storage environment becomes the third axis.
You can remember Teradata, Greenplum, MS PDW, and a few more distributed relational platforms. They are very successful commercial software. It’s true, but these platforms are not intended to process transactions.
So the fourth axis is the load type: OLTP vs. OLAP.
I would like to draw a 4-dimension cube on the blackboard, but I can’t :)
There are no clear borders between the described classes. Relational databases get some non-relational facilities, while non-relational platforms implement SQL. Disk-based systems become in-memory features, while in-memory databases learn to store data on disk. Monolithic platforms become distributed versions.
The main idea of this presentation is the following: you have first to define the class of platforms for your solution and then choose a platform inside a class.
Not all the classes are equal. Monolithic platforms are much more robust than distributed ones. Relational model is universal in contrast to NoSQL. On-disk storage is cheaper than in-memory.
That’s why a relational monolithic on-disk platform is almoast always the right choice. So, choose PostgreSQL! This platform really covers more than 90% of problems.
Video
Slides are available to logged-in participants