Интеграция данных в мире микросервисов
Since its launch in 2008, Zalando has grown with tremendous speed. The road from startup to multinational corporation has been full of challenges, especially for Zalando's technology team. Distributed across Berlin, Helsinki, Dublin and Dortmund — and nearly 900 professionals strong — Zalando Technology still plans to expand by adding 1,000 more developers through the end of 2016. This rapid growth has showed us that we need to be very flexible about developing processes and organizational structures, so we can scale and experiment. In March 2015, our team adopted Radical Agility: a tech management strategy that emphasizes Autonomy, Purpose, and Mastery, with trust as the glue holding it all together. To make autonomy possible, teams can now choose their own technology stacks for the products they own. Microservices, speaking with each other using RESTful APIs, promise to minimize the costs of integration between autonomous teams. Isolated AWS accounts, run on top of our own open-source Platform as a Service (called STUPS.io), give each autonomous team enough hardware to experiment and introduce new features without breaking our entire system.
One small issue with having microservices isolated in their individual AWS accounts: Our teams keep local data for themselves. In this environment, building an ETL process for data analyses, or integrating data from different services, becomes quite challenging. PostgreSQL's new logical replication features, however, now make it possible to stream all the data changes from the isolated databases to the data integration system so that it can collect this data, represent it in different forms, and prepare it for analysis.
In this talk, I will discuss Zalando's open-source data collection prototype, which uses PostgreSQL's logical replication streaming capabilities to collect data from various PostgreSQL databases and recreate it for different formats and systems (Data Lake, Operational Data Store, KPI calculation systems, automatic process monitoring). The audience will come away with new ideas for how to use Postgres streaming replication in a microservices environment.
Слайды
Видео
Другие доклады
-
Oleg Ivanov Postgres Professional
Применение методов машинного обучения для улучшения планировщика
In the speech we consider the current PostgreSQL planner model, then the possibilities of applying machine learning methods for planner improvement and the obtained results.
-
Jean-Paul Argudo Dalibo
Миграция на PostgreSQL : причины... и последствия
The talk will be articulated around all the traditional arguments to "how chose PostgreSQL over other choices in the database domain"... But also, and that's quite new in the comunity, what are the consequences of this choice. Because the PostgreSQL adoption brings adoption of other things like Linux, but also, Open Source thinking, the fast pace of PostgreSQL will command new methods of validation the company must adapt to... etc.
-
Marco Slot Citus Data
CitusDB: расширение для масштабирования PostgreSQL
CitusDB is an extension for PostgreSQL that can distribute tables across a cluster of PostgreSQL servers. Data is stored in shards that can use append-partitioning for bulk-loading of time series data or hash-partitioning for real-time data ingestion. SELECT queries on distributed tables are transparently parallelised across the cluster, using all available cores. Distributed tables can also be joined in parallel, even if they are not partitioned along the same column. CitusDB is especially suitable for real-time analytics use-cases such as dashboards which require fast analytical queries over live data, and can simultaneously act as a scalable operational database. This talk will describe the internals of CitusDB and give a live demo of a large-scale CitusDB cluster.
-
Pavel Stehule
Хранимые процедуры в PostgreSQL, язык PL/pgSQL
- the architecture
- the design and implementation of PL/pgSQL
- the difference between PL/SQL and PL/pgSQL
- the advantage and issues of PL/pgSQL