title

text

Valentine Gogichashvili
Valentine Gogichashvili Zalando
13:15 04 February

Data Integration in the World of Microservices

Since its launch in 2008, Zalando has grown with tremendous speed. The road from startup to multinational corporation has been full of challenges, especially for Zalando's technology team. Distributed across Berlin, Helsinki, Dublin and Dortmund — and nearly 900 professionals strong — Zalando Technology still plans to expand by adding 1,000 more developers through the end of 2016. This rapid growth has showed us that we need to be very flexible about developing processes and organizational structures, so we can scale and experiment. In March 2015, our team adopted Radical Agility: a tech management strategy that emphasizes Autonomy, Purpose, and Mastery, with trust as the glue holding it all together. To make autonomy possible, teams can now choose their own technology stacks for the products they own. Microservices, speaking with each other using RESTful APIs, promise to minimize the costs of integration between autonomous teams. Isolated AWS accounts, run on top of our own open-source Platform as a Service (called STUPS.io), give each autonomous team enough hardware to experiment and introduce new features without breaking our entire system.

One small issue with having microservices isolated in their individual AWS accounts: Our teams keep local data for themselves. In this environment, building an ETL process for data analyses, or integrating data from different services, becomes quite challenging. PostgreSQL's new logical replication features, however, now make it possible to stream all the data changes from the isolated databases to the data integration system so that it can collect this data, represent it in different forms, and prepare it for analysis.

In this talk, I will discuss Zalando's open-source data collection prototype, which uses PostgreSQL's logical replication streaming capabilities to collect data from various PostgreSQL databases and recreate it for different formats and systems (Data Lake, Operational Data Store, KPI calculation systems, automatic process monitoring). The audience will come away with new ideas for how to use Postgres streaming replication in a microservices environment.

Слайды

Видео

Другие доклады

  • Alvaro Hernandez
    Alvaro Hernandez 8Kdata
    45 мин

    PostgreSQL & Java: past, present and future

    Java is the most used programming language in the world. Yet how is it supported in PostgreSQL? What are the gotchas and the best practices? Now that Java is evolving significantly, how will PostgreSQL follow?

    Despite Java's age, language is stronger than ever. It's the de facto programming language in the enterprise world. And since Java 8, it is having a come back in the startup and open source world. PostgreSQL is accessed more from Java than any other interface but, how's Java supported in PostgreSQL?

    This talk will analyze how it has been in the past, but more importantly how can you use it and what can you do today. JDBC drivers, best practices, pl/java and other less frequently used tools will be presented and discussed.

    And then we will look into the future, to see what is currently under development. Like Phoebe, a new Java Reactive Driver for PostgreSQL that targets clusters, pipelined queries and non-JDBC interface for fully asynchronous operation. And also what needs to be done in areas like server-side Java, to bring Java to a fully advanced first-level language within PostgreSQL.

  • Tatsuo  Ishii
    Tatsuo Ishii

    PostgreSQL clusters using streaming replication and pgpool-II

    The talk is about PostgreSQL clusters using streaming replication and pgpool-II, which are quite popular in Japan. Plus, the next version of pgpool-II will be released this winter, so the talk will be about what's new in the version.

  • Kamil Islamov
    Kamil Islamov Stickeroid Ai
    22 мин

    Analytic reports data processing optimization

    Method of automated refresh of preprocessed results of analytis reports is provided. Preprocessing and caching of reports allows ability for fast response for big data reports. Author describes the way of reports cache refreshing with minimum server loads and tuned actualization rate.