«

»

Accidental Data Warehouse

Share

Accidental data warehouseIn many organisations we find what is called the “accidental data warehouse”. Sometimes you may even find more than one such data store floating around. These data warehouses have typically grown organically and uncontrolled as the result of separate and uncoordinated BI development and implementation initiatives.

Characteristics

In the accidental data warehouse we often find non-overlapping subsets of data obtained from different transactional and operational systems. It often supports reporting and analytics for different business units or different user communities, but with no coordinated integration between these. It is often the result of individual analytical or reporting silos that have been created with no interaction or coordination among them, and they are used by distinct sets of users with almost no common understanding of each other’s business functions. It is usually evident as non-overlapping data models with no common structures that are shared between them.

In such an environment you often find tool-specific information delivery facilities – an OLAP cube here for a particular OLAP tool for one community, flat file extracts over there for an analytical tool used by another community and one or more conventional reporting tools operating off independent data marts, some with dimensional models and others with hybrid data models slapped together using blue tack and bubble gum.

You may often find the same data extracted more than once from the same source system, but each time transformed using different business rules. Or maybe not even transformed at all… Standard definitions typically do not exist and you often get similar terms defined in different ways by separate business communities. Just ask how many definitions of “revenue” there are. You will often find users creating applications (such as complex Excel spreadsheets or random little Access databases) on their desktops to further tweak the data before creating reports from it that are passed on further inside or outside the organisation.

In short, it results in disparate and inconsistent reports, and total confusion should the same report contents ever end up on the same desk along different routes.

Implications

The biggest danger this introduces is that you would get a different answer to the same question if you ask it from different people in the organisation. Again, just ask the respective groups what the organisation’s revenue was for the previous month… Then you start to wonder – which of these answers actually find its way to the exco, or the shareholders? Is it even the correct answer? And then, what serious decisions don’t they make using that answer?

Other typical implications are a lack of data availability, a lack of common definitions, serious problems with compliance reports, increased information delivery costs, excessively long “time to market” of any new ventures, and very crucially, a serious lack of confidence in the data throughout the organisation.

This can result a reduction in business productivity. A recent survey showed that 70% of users’ time is spent manipulating data and only 30% of their time is spent analysing the data. It is estimated that 75% of a typical IS function’s time is spent on extracting, preparing and manipulating data, with only 25% of their time spent on reporting, analysing and presenting it to the business.

What to do

The first step with any such a serious problem is to realise that you have the problem, acknowledge its presence, and make the conscious decision to do something about it. Of course, that is not so easy in a large and highly federated environment, especially if the powers that be are in total denial.

The next step is to obtain or appoint the appropriate resources and to put proper team structures in place that can work on eliminating these problems, while carrying on with day-to-day development and support activities as well. This requires very strong leadership of the team and very strong business sponsorship. Such an initiative must be driven and be mandated from the top. A small little unit hidden away somewhere under finances or customer care is not going to have the clout to eradicate the inconsistency and independence demons. It also requires skilled and experienced business analysts, data modellers and data architects to resolve and correct the inconsistencies.

The next step is to scope and prioritise what exactly will be fixed. Which silos are going to be integrated with which other ones and how. This requires an analysis of the business functions they support, and if any are excluded from this exercise, why this is allowed and until when it will be tolerated?

This is followed by a detailed analysis, focussing on the usage of each silo’s data. Which user groups are supported? Are there any cross references between these groups? What is the business value contributed by the data in each of these silos? What is the granularity and frequency of the data utilised? You also need to analyse where these data items are originally obtained from?

The rest of the process is “divide and conquer”. Based on the analysis performed, mini business cases must be drawn up, estimating the potential costs of fixing the problem and the resulting increase in revenue or decrease in operational costs.

This is followed by developing high level subject area data models for each area, ensuring that these models are consistent with respect to each other and that they conform exactly on the various integration touch points. This is a crucial step. The reason that it is done per subject area is to limit the scope of each exercise to a handleable chunk of work with a limited and focussed enough audience. The crux though is that these subject area models must be integrated and conformed.

Part of this process is forcing the business to come up with standard definitions and business rules, which have be accepted and signed off across the entire business. This is a political minefield to walk through, and this is where the business sponsor must ruthlessly blast the obstacles out of the way to clear the way for the business analysts to perform this task efficiently. Political games and domain protection can seriously stall these efforts and frustrate the people doing the work. Lengthy political battles will also exacerbate the executives and senior managers who have to agree on and sign off on organisation-wide terms, rules and definitions.

From the integrated subject area models, an implementation roadmap can be drawn up – how the various parts of the models can be populated and what reporting and analyses components can then be implemented to access the data contained in these models. Of course, this has to happen in conjunction with supporting on-going reporting and analytics requirements, so the implementation roadmap is often dove-tailed with new projects on the table at the time. The basic approach is something like this: if you touch an area of the data warehouse to address new business requirements, fix it into the integrated subject area data models at the same time, and redo all reporting of the affected business areas to run off the new and improved data structures.

In this way the data warehouse is integrated silo by silo, in alignment with current business and on-going development activities, where every effort is cost-justified by its business case.

Summary

An accidental data warehouse is like a heredity disease – it is not a shame if you inherited it, you just have to adapt your lifestyle and care program to deal with it. Once an organisation has realised and acknowledged that it has an accidental data warehouse, and it has decided that it wants to do something constructive about it at a high enough level, there are processes and techniques that can be followed to slowly fix it without having a drastic impact on the business, thereby eventually even eliminating all its resulting complications.

Leave a Reply