In my previous blog I discussed the concept of data discovery or exploratory data analysis and how organisations are using it more and more today. A related concept is data mining. Now while data discovery is used to analyse the data resource of the organisation in order to understand what data elements there are and to identify their characteristics, data mining is used to detect trends and patterns within or between those data sets. So, the two concepts are very closely related indeed. In this post I discuss data mining, what it entails and what it is used for.
The meaning of the term data mining has also changed somewhat over the years. In fact, at one stage, many years ago, the term data mining was often used for the data analysis step used within advanced analytics. Some people even used it (albeit incorrectly) for applying advanced analytical models to the data too. Actually, data mining relates to descriptive data mining only – which is used to characterise the properties of the data, and not to predictive data mining, or rather predictive analytics then – which is used to perform inferences on the current or available data to make predictions about the future. With the increasing importance and wider acceptance of visualisation and data discovery, data mining is now used more appropriately to refer to the process of digging through the data to discover details, trends and processes.
Bill Palace from the Anderson Graduate School of Management at UCLA uses this definition in his lecture notes: ‘Generally, data mining (sometimes called knowledge discovery) is the process of analysing data from different perspectives and summarising it into useful information – information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analysing data. It allows users to analyse data from many different dimensions or angles, categorise it, and summarise the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.’
In their 2001 text on Data Mining, Jiawei Han and Micheline Kamber talk about data mining as a key part of knowledge discovery. Data mining is all about extracting useful knowledge from large amounts of data. The mining analogy is quite apt – finding a small set of precious nuggets (the knowledge) from a great deal of raw material (the data).
There are other related terms that are also used in academic and research papers for data mining, like knowledge mining, knowledge extraction, data/pattern analysis, data archaeology and data dredging, but in the general business world the term “data mining” is more popular.
Data mining is a multi-disciplinary field, drawing from areas including database technology, artificial intelligence, machine learning, neural networks, statistics, pattern recognition, knowledge-based systems, knowledge acquisition, information retrieval, high-performance computing, image and signal processing, spatial data analysis and data visualisation.
Data mining forms part of knowledge discovery, which is a more encompassing process. Knowledge discovery consists of the following steps:
- Data discovery – detect, identify and characterise the available data.
- Data cleansing – remove noise and inconsistent data.
- Data integration – combine data from multiple data sources.
- Data selection – retrieve the data relevant to the task.
- Data transformation – consolidate the data into a form appropriate for mining (e.g. by applying summarisation or aggregation).
- Data mining – apply intelligent methods to detect and extract data patterns.
- Pattern evaluation – identify truly interesting patterns that represent knowledge (according to some measure of interestingness).
- Knowledge presentation – present the knowledge found to the business users (e.g. using data visualisations).
Data mining is therefore the essential step to apply intelligent methods with the aim of uncovering interesting data patters hidden in large datasets. However, in some organisations the term “data mining” has become more popular to refer to the entire process of knowledge discovery as well.
A very important aspect is that the patterns are usually not known in advance. Thus a data mining application or toolset must be able to search for different types of patterns, often in parallel for greater efficiency. The patterns must also be detected at different granularities – i.e. at different levels of abstraction or detail. A good data mining solution will also indicate a measure of trustworthiness or certainty associated with a discovered pattern, because some patterns may not hold for all of the data in the analysed data set.
So what kind of benefits can organisations achieve from using data mining? And what kind of organisations are making actively use of the concept as it is now defined?
One industry that has started making use of the concept is the healthcare profession. With the growth in Electronic Health Records (EHRs), more and more facilities are gathering huge amounts of digitised patient data. Healthcare providers and researchers can therefore use data mining to expose previously unidentified patterns from immense stores of data and then use this information to construct predictive models to improve diagnoses and healthcare outcomes.
Another profession who has seen the benefits and value of data mining is the retail industry. By applying data mining tools, retail chains are able to discover what days most consumers come in to the shops and do their weekly shopping, what they spend most of their money on and which of the products stocked on their shelves have made them the most and the least amount of money. This information has delivered some significant insights to buyers and planners and has given this sector the ability to use the information gained to actually increase their revenues.
So with the advances made in data mining toolsets and the application thereof, data mining has become a very useful concept – if used properly it aids in discovering real information, trends and patterns within the organisation’s data. In fact, used correctly, it can provide information that can aid in the identification and implementation of new business strategies. With data mining, as with data discovery, you can now find a whole host of valuable information within your organisation’s data that you never knew existed previously. In a future post I will discuss the functionalities provided by data mining tools and the types of patterns that can be discovered.