Big Data Warehousing & Business Intelligence Summit


EnterpriseIQ Big Data Warehousing & Busines Intelligence SummitI have just returned from attending the 5th Annual Big Data Warehousing & Business Intelligence Summit, organised by Enterprise IQ, held in Sydney on 27 – 29 November 2012.  I participated in two panel sessions, firstly as a panelist on doing business in the Big Data age, and secondly moderating the session on the architectures of the next generation analytical solutions. The summit confirmed that organisations are now more competent in managing and utlising Big Data. The focus has shifted from Big Data being a mere challenge, to the application of innovative approaches and efficient utilisation of the new toolsets.

The New Age of BI

John Brand from Forrester Research started the proceedings with a presentation on the New Age of BI. I really liked that he discredited the 3 Vs as mere measures of Big Data. Forrester defines Big Data as the techniques and technologies that make it economical to capture value from data at an extreme scale.

The activities performed on Big Data are way more important than the measurements:

  • Store – Big data may have to be collected in a seamless repository, not necessarily in a single physical database.
  • Process – To cleanse, enrich, calculate, transform, and run algorithms is way more complex than on traditional data.
  • Access – If the data cannot be searched, retrieved, and visually presented, it is hard to make any business sense of it at all.

John offered these wisdoms too, which I thought were important:

  • Big Data is not an incremental solution to the old problem of data getting more, but it is a totally new disruptive approach, where you have to trade off consistency against availability and performance.
  • When analysing Big Data you may also have to settle for an up-to-date estimated indicator across a very large population, rather than a 100% accurate measurement. That typifies the difference between Big Data and traditional Analytics.
  • Although Big Data contributes to further customer insights, organisations have to follow the ground rule that customer profiles must be maintained in one single system to avoid duplication and inconsistency.

Solving the Mysteries of the Universe with Big Data

Sverre Jarp, CTO of CERN, who runs the Large Hadron Collider (LHC) project in Switzerland, gave a very interesting presentation on the generation and processing of scientific data at CERN. It is simply astounding how much data is generated by the multitudes of sensors in the LHC, as well as the complexities of their analyses. Much like Google and Yahoo! CERN have developed their own highly distributed data store, spread over commodity hardware located all over the world, accessed through community-developed analysis tools. At the time they started, Hadoop didn’t even exist…

A very useful comment by Sverre was that you have to force structure onto the unstructured data as early as possible in the process, in order to be able to effectively filter, process and analyse it.

Understanding and working with unstructured text

Bill Inmon gave two presentations based on the premise that the vast majority of useful Big Data is unstructured text. He showed that raw text can be downright dangerous to use blindly, but it is equally dangerous to ignore it – as illustrated by the Gulf oil spill. An interesting application of natural language processing is the automated documentation of old source code.

Raw text requires context, recognition, interpretation, standardisation and other enrichments such as terminology resolution, taxonomy, proximity / cluster analysis, cross-language translation and slang / shorthand / acronym translation to unearth the real data represented by the text.

The types of transformations would differ for simple repetitive text (log files, most email, tweets, POS data, etc) as opposed to complex non-repetitive text (documents, claims, contracts, warrantees, medical records, etc). According to Bill, the latter has higher business value. I may just argue against that point in a subsequent post.

In his follow-up presentation he continued this theme by arguing that organisations should employ what he calls “textual ETL” as soon as possible in the process in order to reduce and make sense of the mass of unstructured text out there. A large part of textual ETL is doing textual disambiguation, which is the process to structure the text using natural language parsing, processing, recognition and the other enrichments mentioned above. Unfortunately he didn’t show how the process works in practice…

I do, however, wholeheartedly agree with Bill on the necessity to impose structure and especially context on text before users can derive any value out of it (except throughmanual  inspection). Doing it once in an ETL type process also makes sense, if you’re doing it correctly. That avoids re-transforming it repeatedly during analyses, unless you need to recontextualise it, of course.

Metadata-driven Agile Data Warehousing

Mark Uksusman discussed eBay’s agile data warehousing approach where every step is actively metadata-driven. This not only enables a real rapid agile implementation, but also leaves them with a well-documented and completely traceable and auditable solution.

In a way similar to CERN, but of course on completely different types of data and systems, eBay only uses Hadoop as an initial gathering place of data. They use Hadoop’s distributed processing capabilities to recondition the data, then rapidly move relevant subsets to Teradata for analysis.

A common thread that came up in multiple presentations as well as in the panel discussion on agile data warehousing is that you have to automate as much of the process to reduce the durations of the sprints. With the increased testing that takes place when using an agile approach, automated testing for one makes a lot of sense.

Analytics to achieve business strategies

In a very spirited presentation, Juan Gorricho of Walt Disney World Parks and Resorts showed two case studies how they have applied data warehousing and analytics with full feedback loops back into their application systems to achieve significant business value. He also discussed many of their mistakes and learnings, as well as their approaches to handle BI and its related politics in a very large and complex organisation.

Whereas many organisations leave HR as one of the last business areas to warehouse and analyse, it is refreshing to see that at WDW Parks and Resorts the analysis of resource data gets top priority, rightly so because of the huge impact that their staff directly have on their customers’ experience.

Concluding remarks

Of course there were many product and service provider vendor sessions as well. Although some were good, some interesting and some maybe too infomercialised for my liking, it will simply take too much blog space to review those sessions too. I will rather review the standout products separately in future blog posts.

A take-out for me from attending this conference is that although many of the solution vendors are treating Big Data as a similar massive pool of potentially useful data, there are vast differences between the different categories of Big Data, and these different categories have to be treated very differently. How you acquire, store, process and analyse unstructured text, scientific data, geospatial data and other types of Big Data are all very different, and at some point may require specialised toolsets or approaches to do efficiently.

Steve Hoskins from Rolls Voice deserves special mention for keeping the enthusiasm flowing. I have yet to see another MC at an IT conference rouse up applause for each and every speaker as if they were stepping up to receive an Oscar. Good on you Steve.

The conference was run as smooth as clockwork, with a very impressive mix of international and local speakers, and a good balance between users, vendors and industry experts. Extremely informative, stimulating and useful – highly recommended.

1 comment

1 ping

  1. Salomé

    Cool read! It would be great to hear about the standout products

  1. Homepage

    … [Trackback]…

    [...] Read More here: martinsights.com/?p=384 [...]…

Leave a Reply