Consensus, Postgres, Multimaster
Postgres Pro Multimaster is Postgres extension (and a set of core patches) providing high availability (HA) with strong consistency and read scalability. It forms symmetric shared-nothing cluster synchronously replicating the data and automatically performing disaster recovery. During the last year we've put significant efforts ensuring and proving that consistency is preserved in all scenarios. The new version, which will be released as part of Postgres Pro Enterprise 13 uses Paxos algorithm for determining transaction outcome and custom protocol governing the recovery process; we used TLA+ model checker to verify its correctness. I'll tell how these things work and why in some cases multimaster may be an attractive alternative to the traditional streaming replication based HA deployments.
multimaster is now open source, available at https://github.com/postgrespro/mmts
To make the talk less narrow specialized and more appealing to the wide audience, in the first part I will shed some light on how generally modern DBMSs (mostly so-called NewSQL) handle fault tolerance. In particular,
- what is a strongly consistent DBMS and the associated overhead;
- what is distributed consensus, Paxos, Raft;
- how they help here;
I won't do an attempt to explain any algorithms line-by-line; it would be hardly useful given the time frames and there is a lot of literature available anyway. The goal is rather to waymark the field and get you a bit comfortable there.