Previously, I discussed what data waste is and what could likely cause it within an organisation. In the blog, I compared data waste to a box of unused clothes lying rotting up in the attic and the unwanted perils it may cause. In this second blog, the focus turns to how to best deal with data waste by looking at the key components of an effective data waste strategy.
Identifying data waste
As highlighted in my first article, Kazuki Ohta writes in a Forbes Tech Council column: ‘If managers and other team members are putting in 10-plus hours each per week into customer data maintenance, then there is likely a costly inefficiency hidden within the data’.
If data stewards, report developers and other knowledge workers are spending excessive manhours to find the right version of the truth, and to resolve inconsistencies due to data duplication to provide accurate information to decision-makers, it often points to data wastage issues.
So, why does valuable data sometimes go unused? In some instances, the organisation is not mature enough to realise the value of the data. However, the most likely cause of data remaining unused is that staff members cannot find the data or do not even know it exists in the first place. Properly catalogued and documented data is much easier to find and analyse. Yet, few organisations have fully implemented and put thorough metadata management, dictionaries, and catalogues in place.
A related symptom of data waste is that the organisation has the data, but not where it is needed. Data being in the wrong place can indicate a siloed organisation where the necessary data is not shared across organisational boundaries. Another aspect of data waste can stem from its location. If data is stored in systems that are difficult to access, for instance complex cloud infrastructure or sitting behind overly restrictive firewalls, that data might never be used.
A major source of data waste lies in badly designed and inefficiently integrated systems. These can result in data being duplicated eventually making it near impossible to keep synchronised. As an example, some banks have duplicated customer data across their CRM environments, the credit card application, the transmission and saving account system(s), and in the loyalty points application.
Another way to identify this type of data waste is when there is little connection between data producers and data consumers. If the people responsible for data capture and maintenance do not know how the data will eventually be used, there is a good chance they may collect superfluous data or in a form that makes it hard to work with. This will result in downstream duplication or unnecessary cost and effort to locate and use the data.
Managing data waste
Reducing data waste is not as simple as deleting or archiving a lot of unused data. The organisation needs a proper data strategy that covers the destruction and archival of data. The data strategy must define which data is to be retained, for how long, and for which reasons. Some data must be retained for legal or regulatory purposes, while other data needs to be retained to train machine learning models or to analyse historical trends. Some data may also not be retained beyond certain deadlines. It is important that the data strategy explicitly addresses the handling of redundant, obsolete, or trivial (ROT) data.
Furthermore, the data strategy must provide an answer to the question: ‘What if we need it later?’ It must also address the issue that the more data is retained, especially sensitive data, the higher the threat of compromise is. At a time when cloud data storage is affordable, offline data storage may sound archaic. However, it has the advantage of being much more difficult to compromise without some physical human interaction.
A key component towards implementing such a strategy is a metadata solution. This solution will incorporate dictionary and cataloguing functions on the logical and physical levels to form a fully up to date inventory of the data resource of the organisation.
On TechTarget, Robert Sheldon states that the inventory needs to record the amount of data the organisation has, where it is located, who owns it, who can access it, and how long it has been there, as well as retention requirements. These can encompass the requirements from compliance and business perspectives as well as any destruction requirements. The catalogue must include data names, taxonomy, labels and groups, as well as physical storage details.
Another useful tool for identifying and managing duplicate data is a master data management (MDM) solution which can be used to efficiently identify the correct and most up-to-date version of each data occurrence across multiple systems. Sophisticated MDM solutions allow organisations to define complex business rules for identifying, as well as for merging or consolidating multiple records from multiple disparate systems and representations into a single ‘golden record.’
This single source of truth eliminates the uncertainty between multiple versions and aids in standardising operations, thereby improving data quality too. Looping back to the metadata solution, the single source of truth must be documented and advocated throughout the organisation to be of value.
Avoidance
One of the most effective ways of dealing with ROT data is avoiding it in the first place. Remember, if you do not put the box of unused clothes up in the attic, it will not simply lie there and rot away. Organisations should look for ways to decrease the creation of unnecessary data. For instance, they must avoid creating ‘spreadmarts’ and other similar subset copies of the same data for operational or reporting purposes. Many workflows can be optimised to eliminate these problems. Additionally, proper data integration tools or platforms can be used. Often, the explicit costs of a good data integration platform are significantly less than the hidden costs of data waste.
As always, there is the people angle to consider. Together with data literacy and increasing data-centricity and fact-based reporting, the data leaders within the organisation need to take responsibility for educating the rest of the business on data waste, the perils thereof, and the associated costs and risks.
Of course, those people who work with the data daily also have a part to play in curbing the amount of data wastage. If they understand the concepts and implications, they can incorporate the appropriate plans and waste reduction approaches in their business processes and corresponding system implementations. To bring it back to our analogy, if nobody makes you aware of the potential of rats nesting in the attic, you will not realise the risks of putting the box of unwanted clothes up there. The problem often only hits you down the line somewhere.
[NA1]Hyperlink this word with link to first blog post