Disney World of Data


Big Data at TDWI OrlandoOn the way to the TDWI conference, I recently took my children to Walt Disney World in Orlando. This is truly a world of wonders and they were blown away by the colossal fairytale castles, the life-sized characters, and the vastness of the many parks. They were simply astounded by the size and scope of everything. I realised that this is similar to the reality in most facets of our lives – everything is becoming bigger, faster, and more fast-changing. Like Big Data. I have discussed this topic before, but the reality is that today we are steering even faster towards a ‘Disney World of Data’. Data is increasing at an incredible speed and businesses need to embrace this – as it will not go away any time soon.

Although Gartner positions Big Data at the peak of their hype curve at the moment, speakers like John O’Brien at TDWI reckon it’s already on it’s way down the curve, and that it should pull through the trough of disillusionment very quickly, based on the combination of vendor momentum and business drive. A recent article on Gigaom by Derrick Harris* also stated that Big Data has turned from a buzzword that was best left for the large Web companies into a force that drives much of our digital lives.

Speakers at TDWI highlighted the following emerging trends that are affecting the way Big Data is being approached.

 Hybrid architectures to support Big Data and data science

So-called hybrid data warehouse ecosystems are emerging, with raw Big Data stored in the Hadoop file system, semi-structured data in specialised NoSQL stores, and the structured data manipulated in the traditional EDW. These environments cater for exploratory data analyses, advanced event processing and other business analytics, as well as real-time and near-time loading of the data warehouse to get information much faster to decision-makers and back to the operational applications that can function even better when incorporating business intelligence and analytics results.

Any organisation embarking on Big Data analytics can almost be guaranteed to end up with more than one type of data store in their data warehouse environment. My recommendation here still stays the same as for conventional data warehouses – keep the architecture as simple and as streamlined as possible, preferably as real-time as possible too.

Hadoop’s family of tools

Many tools are being released that considerably improve the ease of use of the fairly primitive and batch-oriented MapReduce interface to Hadoop. Some tool vendors are embracing HiveQL – a SQL-like interface to get to Hadoop data, while others rely on the  HCatalog semantic layer to access Hadoop data in more real time in a more business-oriented context. More and more tools are appearing to not only execute MapReduce code more interactively but to also fit in with next-generation platforms and technologies to enable a variety of users to make use of it.

We will see these developments continuing, and until standard interface languages are defined and are widely adopted, we will see many different flavours and variants appearing. Until a standards shake-out appears, if it ever does, you will have to carefully select the access and analysis interface to your Hadoop data, so that it suits your type of data as well as your team’s skills levels and style of analysis.

NoSQL coming soon to an app near you

It was only a matter of time before more applications are developed that manage Big Data in purpose-built data stores. The NoSQL movement includes key-value, graph, document and columnar data stores, each suited to a particular category of Big Data. Developers can now make the best use of these specialised technologies that enable them to develop great apps running natively and naturally against these platforms. With more analytical tools also supporting the NoSQL data stores, it will enable a new generation of analysts to make the best of more specialised Big Data for their organisations.

The NoSQL data stores that support graph structures in particular interest me. To efficiently store and analyse graphs with dynamic depths and widths has always been problematic. Relational and dimensional structures do not cater well for variable depth trees, never mind more complex graphs. Now, if you want to analyse problems like “who knows who else”, or “what has been recommended or bought by who else”, there are specialised products like Neo4J, InfoGrid, Infinite Graph, OrientDB, FlockDB out there to store, process and efficiently analyse graph structures.

Machine learning is everywhere

While it has been around for some time, machine learning is now getting a serious business application. As more unstructured textual data is being brought into the organisation’s systems, businesses can derive immediate benefit from this data by applying machine learning to force structure and context onto this data. Humans simply cannot manually interpret and classify large volumes of text at the same pace.

My spin on this one is, if you are going to want to get any value out of unstructured text, you better invest in some natural language processing tools and skills – and even more so if you’re in a multi-lingual and multi-media environment.

Mobile devices as the delivery platform for analytical intelligence

Your mobile device is a great interface to information and it has the added benefit of knowing what your preferences are. (And your six year old knows better than you how to run it.) Our phones and the apps we install are one of the richest sources of personal data. Apps are being developed such as Siri and Saga to act as our personal assistants to pull all this information together. So now the challenge is not to only deliver the relevant information to the device, in a consumable format, but to also pull the personal data from our mobiles back into the Big Data pools of the organisation (should we allow them to).

This is an area where I think geospatial analysis is also going to play a huge role. Your mobile device already “knows” where you are moving. Organisations can get such valuable information from analysing your movements, and then adjusting their offerings or services to your behaviour.

Concluding remarks

When you first arrive at Disney World it is overwhelming. You don’t  know where to go or what to see first. This experience is similar for businesses encountering Big Data the first time.  They don’t know what to do with the copious amounts of data they are now managing as well. This makes it difficult to extract any value from this data. You have to start planning how you are going to deal with Big Data to ensure you are not left wandering aimlessly in dumb-struck amazement around the theme park.



1 comment

  1. Jessie

    Really enjoyed this article. And it highlighted some new technologies to investigate. And tickled my ‘I want to go to Disney World’ nerve…

Leave a Reply

hope howell has twice the fun. Learn More Here anybunny videos