Big Data Defined?


Big Data DefinedGartner identified ‘Strategic Big Data’ as one of the top strategic technology trends for 2013. Big Data is surely a hot topic at the moment, and it definitely has a role to play both as a value-adding technology to some organisations and as a business enabler in the broader BI industry. However I find that it is not clearly defined and therefore not fully understood. This often causes a disillusionment that Big Data cannot add any value to any organisation, in fact that it is over-hyped.

As yet, I have not yet come across a single widely accepted definition of Big Data. (I’ll share my theory about why that is in a subsequent post.) I have found many descriptions and even more characterisations of Big Data though. These descriptions help us to understand the broader concept, as well as allow us to unpack the terms to determine if, how, and to which extent a business should be investing in Big Data.

Like everyone else, we use our own working description to get the point across, and it has over time evolved into the following:

‘Large amounts of fast-moving, semi-structured and unstructured data that cannot be handled by traditional data management tools, mostly due to its complexity.’

This description is often used by our advanced analytics team in Sentiment Analysis presentations, where in some instances Big Data can contribute some value.

As a generalisation, for me the key terms in this description are semi-structured, unstructured, complexity, fast moving, and then if you really have to, large amounts. Personally I think the average organisation deals way more with the complexity of semi-structured data, possibly arriving in real time, than with actual petabytes of unstructured data. The real challenges lie in capturing, storing, analysing and visualising these types of data, in order to get value from it. The second point is that we only start getting real value from it once we can transform it to structured data and integrate it with the other structured information in the organisation. To clarify the last statement – in most cases Big Data doesn’t generate completely new business value by itself, but it often adds complimentary value by adding another few dimensions to the 360-degree view of the customer. (We find the exception to this only in the truly big data-oriented organisations like Amazon, LinkedIn, eBay, Facebook, etc., whose business models are totally centered on these new types of data, but very few organisations fall in this class.)

3Vs do not define big dataNow if I ever hear the 3Vs as a definition of Big Data again, I’m going to get up and walk out. (OK I’ll probably be polite and only open my iPad and read my email.) However, you’ll be surprised how many big shots from the big four software vendors and the big five consulting firms still present this as a definition. (Forrester and Gartner excluded). The 3Vs do not constitute a definition. It is a list of some of the characteristics of Big Data. Here is my list of measures that I use to characterise Big Data, affectionately called 6VS (note the capital “S”):

  • Volume: Typically measured in petabytes or smaller, when the size exceeds the physical limits of vertical scalability of your conventional data management platforms. (Note what is big to a mid-sized retailer, may be small for a Telco.)
  • Variety: The large number of many different formats that make integration complex, as this includes structured, semi-structured and unstructured data. This is one of the component measures of complexity.
  • Velocity: The rate at which data arrives and changes. This has an impact if there are small decision windows – it becomes a major driver for streamlining business processes.
  • Veracity: A measure of the (un)predictability of inherently imprecise data types. (As an example, how accurate are the sentiment scores that are assigned based on voice tone?)
  • Variability: A measure of the number of options and interpretations possible. (As an example, for how many reasons could an on-line user have terminated a session – or have had it terminated – just before check-out?) When related to the structure of the relationship between data entities, this also reflects complexity.
  • Value: The inherent value in monetary terms that it adds to the organisation, measured in terms of increased sales, reduced churn, improved segmentation, more targeted products, etc. The calculation for value added should include costs incurred too, net present value, etc, etc.
  • Sparseness: A measure of the low density of valuable content in the Big Data.

It would be convenient if someone comes up with a more concise index. It is a bit cumbersome to describe your Twitter feed as a 300MB, 5-format, 2 400 tweets/sec, 73% accurate, 23-option, 1/54 300 sparse, $10 per client per month Big Data source! However, it does give a good indication what you are up against, and what value may be obtained from it. (Putting the sparseness before the value reads better, but it jumbles up the acronym.)

I can confidently say that I don’t feel there is one pure definition of Big Data that will give businesses a full view of what it is and what it can do for a company (and I will elaborate on that in a future post). Rather, the concept should be carefully investigated, considering all the relevant descriptions, and by assigning values to the measures above (even if by estimation or based on a pilot project), before fully diving in and implementing it. Careful consideration must be given to the value it can add, and how it can be integrated into the organisation’s BI competencies (which I will also elaborate on in a future post). At the end of the day, Big Data may not feature in the organisation’s strategy map, may not solve its burning issues, may not be able to add any value, or the implementation may be so complex or costly, or the underlying data so sparse, that it doesn’t add enough value to justify its implementation.

As a trend, there is no denying that the concept of Big Data is here to stay. However the investigation and characterisation is critical – only if such an assessment proves to be positive can the promised benefits of Big Data be attained.

Reference: Gartner Identifies the Top 10 Strategic Technology Trends for 2013.

2 pings

  1. Big Data Categorised » Martin's Insights

    [...] « Big Data Defined? [...]

Leave a Reply