Data Discovery


Visual data discoveryNew concepts and technologies continue to emerge in the business intelligence space, which brings new business challenges and of course new opportunities for organisations to understand and ultimately leverage it for business benefit. One such concept that first appeared a few years back, but has yet to get the full attention it deserves, is data discovery, also called exploratory data analysis.


Experts state that there is no single definition of data discovery, because besides being fairly new to the industry, it also means different things to different people. For example, if you work in data management and data quality, you will focus on discovering key metadata about the core data assets, such as data completeness, data quality, consistency and provenance. However, if you are a data analyst assisting the marketing department, you will use data discovery for trend identification, campaign analysis and possibly self-service data discovery for the chief marketing officer.

Consider the definition provided by Dashboard Insight, where they state that data discovery is an analytic approach that helps business users answer “why” and “what if” questions through self-service analytic applications. It clearly indicates what data discovery is used for, but what they fail to express explicitly is that modern day data discovery aims to answer these questions visually.

Cindi Howson, founder of BI Scorecard, use this definition of data discovery tools to explicitly accentuate the visual aspects: Visual data discovery tools speed the time to insight through the use of visualizations, best practices in visual perception, and easy exploration. Such tools support business agility and self-service BI through a variety of innovations that may include in-memory processing and mashing of multiple data sources.


Two aspects come to fore from these and other definitions. The graphical representation of data enables quicker and more insightful analyses, and the self-service aspect allows users to analyse and investigate the data much quicker, and in a much more agile and interactive manner.

Data discovery allows users to better understand the data with easy-to-use interactive visualisations. It allows a user to answer unanticipated questions as they arise, for example through identifying patterns and trends. Essentially, with visual data discovery tools, data discovery through visualisation has pretty much become the modern ad-hoc query process. Again, the two biggest differences between conventional business query tools and visual data discovery tools are the use of graphs and the degree of user autonomy. In a business query tool, a user can certainly add a bar chart to a dense page of numbers, but the chart is an after-thought. With visual data discovery tools, the query and visualization process are one and the same. The data is displayed as a bar chart from the offset. Drag a time period onto the page and it pops up as a trend line. Add a product category, and the trend line is automatically converted to a trellis or small multiple chart. Research has shown that when data is represented graphically, we use less cognitive resources to make a decision and we retain that information better. So the use of graphs is about more than just displaying the data in pretty or engaging formats; it’s about speeding up the time to insight.

The other big distinction between visual data discovery tools and business query tools is the degree of user autonomy. Business query tools generally require a metadata layer that IT is usually required to design and build. This metadata layer provides a layer of abstraction from the physical database schema which may potentially contain hundreds of tables. With a visual data discovery tool, business users are often working on a pre-extracted subset of the data, either in a flat file or a spreadsheet, or directly on the base data, so IT is not a bottleneck in that process anymore. If the base tables are analysed through real-time queries, some of the visual data discovery tools automatically model a metadata layer, giving its best guess at what’s a metric and what’s a dimension, and over which columns the tables should be joined, again with little or no involvement from IT.


A very useful application of data discovery is in data quality analysis and data profiling. It gives data analysts the ability to reveal and investigate the data that is contained in corporate data stores in detail, reveal hidden patterns therein and investigate the relationships between these data elements. For example it quickly shows what outliers exist in the data, and one can then investigate whether they are meaningful or simply random noise. These analyses become crucial to organisations, especially as they need to integrate and differentiate between structured and unstructured data, where this additional insight can be turned into an advantage in the organisation’s business landscape.

There are many more uses of data discovery, including analysing available data and displaying prototype reports during requirements definition, enabling the business to analyse and work with data “as is” before it is cleansed and integrated and experimenting and calculating key enterprise KPIs interactively.

Concluding remarks

As the concept of data discovery matures, and it is more widely adopted, certainly more and more organisations will be lured into needing to understand how data discovery can deliver informational insights with the speed and flexibility that they need. Through the agility it brings to BI, data discovery has great potential to enhance processes around application development, data quality management, and the calculation and representation of corporate metrics.

In conclusion, not only are organisations implementing data discovery to improve the speed and visibility of analysing and presenting information, but also to develop and expand their insight into the data representing their business offerings. Too few businesses are yet making active and productive use of the practice of using visual discovery in lieu of reports, ad hoc (tabular) data queries and statistics.

1 ping

  1. Data Mining » Martin's Insights

    [...] « Data Discovery [...]

Leave a Reply