Data Quality Framework


Data quality frameworkLast week I attended the Melbourne DAMA event, where Richard Kevan of Oakton presented a framework for representing and reporting data quality metrics, which they developed and implemented for the Australian Postal Services’ IT division. Other organisations also use similar frameworks. Such a framework can be used to represent, report on and analyse the measures of other areas of information management as well. 

The objective of the Postal Services project was to devise a framework that produced a single repeatable measure that reflected the on-going changing state of data quality of the National Address file. The work was based on a data quality framework originally published by professor Graeme Shanks.

In order to cater for the numerous data quality measures, such as Completeness, Accuracy, Timeliness, Currency, Reliability, Relevancy, Consistency, Completeness and many more, as well as for the external and internal uses of address data, they classified these measures in three categories:

  • Conformance – these are typically the measures generated by profiling tools; they are objective, and the measurements are taken by by testing.
  • Correspondence – these measures evaluate the relationship between the data items and the entities they represent in the real world; they are objective, but they have to be validated through real world measurements such as personal validation, interaction, sampling, stock-taking, etc. (In other words they are complex to validate.)
  • Usability – these measures reflect the usefulness of the data items to the organisation (such as business impact, trust, security, privacy, etc.); they are subjective, and they can only be evaluated by interacting with people (consumers, project staff, business ‘auditors’, etc.)

Richard illustrated the framework they used, where the global address data quality index was repeatedly broken down into successive levels of sub-indexes, on each level each sub-measure was multiplied by a relative weight so that the combined indexes at each level roll up to 100%. The respective levels correspond to the business areas, the measure classification (given above), and so on in finer and finer detail, down to the actual specifications of how the measures are actually measured and scored. The weightings per measure per level are determined by business priorities, which should reflect how important that particular measure is for the business. The result is not only a global data quality index, but a data quality scorecard that can be analysed and reported on at each of its levels.

An interesting publication by Jay Zaidi in December 2011 presented the idea that data quality should managed in a holistic integrated (cross-silo) approach. He proposed a Holistic Data Quality (HDQ) framework, which incorporates consistent quality measures, exception-based reporting and robust analytics. Central to the HDQ framework is data quality dimensional framework that is based on consistent data quality requirements and data quality metric definitions. The other components of the HDQ framework, such as the Data Profiling Tool and the Issues Management System, all feed the enterprise data quality mart that implements the data quality  dimensional framework. What I like about the HDQ framework is that it also covers all the processes that feed the data quality “engine” and which are affected by and related to data quality.

For me as a BI-focussed person, the beauty of this approach is that the measures and weights can be collected and stored in a normal dimensional database, and be reported, analysed and tracked over time through conventional dashboarding and data analysis tools. Business users can track the management of and improvements made to data quality in the same way that they measure and analyse the processes related to other important enterprise resources like customers, products, money and people.

This resonates well with the approach that we always advocate that BI and other information management processes should also be measured and monitored just like other strategically important business processes. Just like the CFO should have a financial dashboard, the CIO should have, amongst others, such a data quality tab on his dashboard. The same applies to other information-related processes too, such as data warehouse loading, report and dashboard utilization, source data consumption, information dissemination, and many more. Similar weighted levelled dashboards as described above can then also be used to represent, report on and analyse the measures of these key areas.

Leave a Reply

hope howell has twice the fun. Learn More Here anybunny videos