At a very high level, data quality initiatives aim to ensure that the right data gets used by the right people to make good business decisions. There are various dimensions by which we can measure how “right” the data is. If we consider the value of the data – in terms of who uses it for which decisions, we realise that there may be different degrees of “rightness”, depending from who’s viewpoint it is being evaluated. I propose we should incorporate that “fit for purpose” aspect in the data quality management framework.
Data quality refers to the degree to which data, and any information derived from it, can be a trusted source for all the purposes it may be required for. It is all about having the right set of correct information, available at the right time – in the appropriate format, at the right place, for the right people to use in order to make good decisions, to enable the business, serve customers, and in this manner achieve the company goals.
Yet in most organisations, what is important to one business user may not even be worth any effort for another user. The reason for this is that in most businesses, multiple departments look at the same piece of data, yet the information value derived from it is vastly different.
As an example, consider a customer account record that contains the customer’s name, surname, account/product type and account balance. This data is consumed by the marketing, finance and collections departments. For marketing, the customer detail is directly related to the value it places on customer behaviour, cross sell opportunities, and so on. For the finance department, the customer itself only plays a minor role in budget reporting. However, for the debt collection and recovery department, the customer account again plays a critical role, especially if the balance is owing for a particular account type.
So the crucial phrase in the definition above is “for the right people to use in order to make good decisions...”
For the marketing and collections departments, having the correct account/product type and up-to-date contact details would be of utmost importance, because incorrect data in this case could lead to potential revenue loss. For instance, with incorrect data, the marketing department could miss out on potential cross sell opportunities. The collections department could falter on the recovery of outstanding debt, as well as on customer retention. However, in contrast, the finance department would only require the most up to date amounts in the account balance. Should the account balance be outdated, then the finance department would most definitely flag the record as poor quality data. On the contrary, this same data would still be suitable for most marketing purposes as the account type and product have been verified as correct.
Irrespective of which data is incorrect, the whole record should not be deemed as being of poor “data quality”, but the quality aspect should be considered in the particular context and in terms of its fitness and purpose for use.
The quality of data can be measured and reported by associating data quality indicators with each data quality dimension, these being accuracy, completeness, consistency, currency, precision, privacy, reasonable, referential integrity, timeliness, uniqueness and validity. There are well-published algorithms how to evaluate and score these respective data quality measures.
Based on the subject area where a particular data item is used and the business value it contributes to or its relative importance, one assigns weightings to each of the key data quality measures per data item per subject area. The data quality framework filtered by subject area will assist in highlighting data quality problem areas within that business context. A breakdown of the data quality framework per data item will indicate its overall usefulness and fit for purpose across all subject areas.
However, on your overall data quality framework, you still determine an overall data quality index per data item (i.e. per data attribute) but you calculate it by summing the weighted scores for each data quality dimension for each subject area, where the weighting factors are then applied according to the relative importance of the data items to the respective subject areas. This gives you the overall view of data quality relative to the combination of contexts where the data is utilised.
In the scenario described above, under the objective of “fit for purpose”, a satisfactorily level of data quality could be achieved, for example, for the finance department even with a much lower referential integrity score, but with a much higher currency and accuracy weighting.
The inverse would apply to the marketing department when it comes to their referential integrity, currency and accuracy weighting scores.
By aligning the data quality dimensions and by assigning weightings and business rules to the information value that the business derives from the data, “bad data” can no longer be a general quality statement. It is now expressed in terms of being fit for purpose. This fit for purpose approach would be expressed through more appropriate measurers respectively relevant to the various consumers of the data, in which case it becomes that the value of data quality is in the eye of the beholder.