title

text

Борис Бондарев
Борис Бондарев
17:40 01 April
40 min

Implementing Data Vault for Building DWH on PostgreSQL and Greenplum: Problems and Solutions

The focus of the presentation is on the challenges of building an application solution on PostgreSQL, specifically a high-load analytical data warehouse. Using the case of the company EVRAZ, we will demonstrate the impact of applying the Data Vault methodology on PostgreSQL and Greenplum databases for developing a unified production performance system.

We will discuss the difficulties and solutions, showcasing query plans for tasks such as updating directories and handling large objects, along with optimization examples. We will highlight coding nuances, problems related to populating the model, and issues with querying from the model.

This session will be useful for those planning to use or already facing challenges with the Data Vault methodology and performance issues in DWH on the open-source stack. We will compare technical implementation options for the Business Vault model layer, considering the specifics of PostgreSQL and Greenplum.

We will also cover 5 real problems that arise when operating a DWH and their solutions:

  1. Transferring Business Vault object assembly logic from PostgreSQL to Greenplum.
  2. Slower ETL performance when building the current state of Business Vault in PostgreSQL.
  3. Slower Data Lineage construction in PostgreSQL and Greenplum.
  4. Slow satellite queries in Greenplum.
  5. Slow queries with "IN" or "OR" in the Business Vault layer.

Other talks