Data quality is in the eye of the beholder


The quality of the data we work with has a significant impact on the quality of the insights we can extrapolate for the business. Following on from my recent mini-series on the evolving data-related roles, it was interesting to come across Edwin Walker’s article on Data Science Central titled: ‘How do different personas in an organisation see data quality?’ Walker has also written on the topic of ‘modern data quality’ in another article available on the same site.

One of the key phrases in the second article cited above was Walker’s reference to how ‘modern data quality practices make use of new technology, automation, and machine learning to handle a variety of data sources, ensure real-time processing, and stimulate stakeholder collaboration.’ Stakeholder collaboration is important as one of the biggest challenges in data quality management is to get data and system owners to take on the responsibility for the quality of data that their processes and systems pump into the organisation. After all, most data quality problems originate where data enters the organisation. Very few systems or processes would take good quality data and mess it up.

Another important phrase from Walker in this article was ‘data governance, continuous monitoring, and proactive management are prioritised to ensure accurate, reliable, and fit-for-purpose data for informed decision-making and corporate success.’ Most companies I have dealt with have not yet prioritised data quality management or even measured it to the extent it must be. I will come back to modern data quality management in a future piece. For now, let the focus turn to how data quality is perceived by the roles I discussed previously.

Data engineers and data quality

Data engineering, which includes data management, software engineering and DevOps, focuses primarily on getting data into a useable form in an accessible location through building and operationalising data pipelines across various data and analytics platforms. A large part of the complications a data engineer must deal with stem from data quality issues.

In the personas focused article, Walker writes that data engineers should have ‘checks preventing issues before data lands (Shift Left) to detecting problems at the warehouse or lake house using anomaly monitoring (Shift Right).’ He goes on to explain how both approaches focus on detecting and preventing data quality issues before decision-makers use the data, or insights derived from the data.

Data scientists or data analysts and data quality

Walker begins this section with the familiar ‘garbage in, garbage out’ adage. It continues to bewilder me that decision-makers understand that they must make decisions based on data, yet some are still not willing to invest in the data or put the appropriate mandates and processes in place to ensure the quality of the data.

I am a great advocate of getting the data analysts to incorporate data quality monitoring and reporting into their reports and dashboards. Often, executives and other key stakeholders will realise and react to the extent of the problem if it is presented to them in a clear, certain, and understandable format. There are many measures of data quality to ensure that the data is fit for purpose. These include aggregate analysis, deterministic rules, statistical measures, data accuracy KPIs, integrity rules, and other custom data validation checks.

Data stewards and data quality

Walker has an interesting take on data stewards: ‘This role is very confusing in the world of the modern data landscape and is even turned more confusing with decades of broken promises from Data Governance platforms.’

I would venture to say that very few companies manage to get data stewardship right. The challenge is that most data stewards are assigned the role in addition to their ‘day job.’ However, data ownership and data stewardship go together. As such, there must be a clear mandate, ample time allocation, and clear measurable deliverables.

Data leaders and data quality

Business stakeholders need to drive their business areas to success. They must understand how they could use the data they have to enable new or improvised strategies toward positive business outcomes. But they must realise that data comes with a price. To get good quality data, there needs to be an investment in proper systems, with good data validation rules built in, and good data quality and governance processes. What makes life interesting is that the data governance processes must, for the most part, span multiple business areas to be truly effective in getting the single version of the truth.

Walker iterates that ‘this clearly shows how business leaders are focused on identifying key business processes, their KPIs and KRIs, and underlying data or metric assets that have a direct linkage to organisational mission-critical priorities. Having baselined values of business KPIs/KRIs before starting data quality initiatives and measuring continuously shall help to have the leaders a top-to-bottom-down view and more importantly which domains, and/or applications need to improve data quality.’

It comes down to the fact that business leaders must take responsibility and accountability for the data quality in their areas. While I have not touched on it here, roles such as the Chief Data Officer, who needs to drive these agendas from the board and the executive level, as well as security and privacy officers, who need to ensure that sensitive data is well protected, must also be accounted for.

In next month’s post, I will take a closer look at what makes modern data quality management different from traditional approaches.

Leave a Reply

hope howell has twice the fun. Learn More Here anybunny videos