Agile Data Warehousing


Agile data warehousing“Agile data warehousing” is a contradiction in terms. Very few activities are less agile than populating a 10 terabyte data warehouse…  However if you want to deliver information frequently and rapidly to the business, you have to speed up the ETL implementation.

Those of us with small children have all heard the Fat Controller moaning at Thomas and his friends for causing unnecessary inconvenience and delay. Don’t we all wish we can deliver BI to the business without unnecessary inconvenience and delay?

In my previous post I discussed how to model a data warehouse with conformed dimensions when using an agile approach. However if we want to increase user interaction and get useful information to the business much quicker, we have to speed up the loading process too. This can be done by using a well architected ETL framework and through faster, smaller real-time loads.

Before embarking on “agile ETL”, you must already have a robust well-tested ETL framework in place
that contains all the functional components. Before you even analyse a single source-to-target data mapping, you already know exactly what your ETL is going to do. You are going to log the time the process ran, with which parameters it was called, how much of which source data it accessed, etc. You also know you are going to populate dimensions, where you have to generate surrogate keys, maintain versions and denormalise hierarchies. You are going to populate facts with key lookups. You are also going to log the loading and execution statistics. So, there is the whole specification, well, sort of… Now go and code and test it, and once done, you’ll be ready to implement agile ETL.

There are two types of ETL processes you have to develop, namely the “now” real-time processes and the big bulk historical load processes. When you’re doing agile data warehousing, you focus on the former. In most big data warehousing programs, the bulk history loads, with their validations and sign-offs cause the most problems, over-runs and unhappy customers, as most of the undocumented and forgotten-about inconsistencies occur in the archaic data. On an agile schedule you don’t have time for that. You have to speedily create the real time loads, by filling in the mappings into the pre-coded framework.

So what about the history loads? You create a separate team, well experienced in hard core bulk data movement, and they code and implement one big data load after the other, basing the logic on that already implemented during the sprints, but also catering for all the older exceptions. One of the spin-offs from approaching it this way is that the business already knows that your loads work correctly, as they have already accepted and implemented the trickle feeds. Nine times out of ten, they will now be way more wiling to play a role in sorting out old data problems, and even accept blame for the state the data is in. A note of caution though – you have to deploy the bulk loads it in such a way that they don’t block the trickle feeds.

In the next post on agile BI, I will discuss approaches to interact with the business, even in non-agile orientated organisations.


  1. Yolanda Smit

    This is very important. For those standing at the edge of launching Agile BI, the recommendation of setting the ETL Framework up in advance is invaluable.

    In an agile team the roadmapping phase (discovery, strategy, planning, and initial release plan) requires time invested from the product owner and business analysts. This buys enough time for the development team to get their ducks in a row and get the launching pad ready.

    Being agile is not to dive into the deepend and just deliver at the speed of light. It’s all about preparing to enable agility, and then delivering quality regularly and at a steady (but faster) tempo. Think ahead a few steps and pave the way with preparation. It makes the “road less travelled” far more enjoyable!

    1. Martin

      Hi Yolanda,

      Thank you for your very appropriate comment – this statement of yours is so true: “Being agile is not to dive into the deep end and just deliver at the speed of light.” Many people think that agile approaches are all about just “winging it” as you go along. Especially in BI, nothing can be further from the truth! As you’ve mentioned, it’s all in the preparation!

      Best regards, Martin

Leave a Reply

hope howell has twice the fun. Learn More Here anybunny videos