I was fortunate to attend the HISA Big Data in Health and Biomedicine conference in Melbourne. The HISA conference is one of my favourites, because you get to hear about all the interesting and challenging applications of analytics in this intrinsically rewarding field.
Last year the conference was about Data Governance, with Big Data featuring in almost every third talk. This year the theme was around Big Data, with aspects of Data Governance featuring in about every fourth or fifth presentation.
Big Data
There are two aspects of Big Data, as it was covered at the conference, that I personally would like to comment on.
The first aspect that bothered me was that many of the speakers still “defined” Big Data in terms of the 3, 4 or 5 Vs. These are characteristics of Big Data, which are useful to describe certain aspects of Big Data – but it is definitely not a definition (see here for further commentary: Big Data Defined?). Especially in the healthcare field, very few organisations have the type of big data related to the first two Vs, namely Volume and Velocity. Variety and Complexity, on the other hand, abounds in healthcare.
In his presentation on Big Data meets Healthcare, Dr Christopher Shute gave a very practical alternative definition of Big Data, as “data that you can barely manage”. I have to smile at the evolutionary nature of that definition. It implies you can never really reach that utopia where big data is totally under control – then, by his definition, it is not big data anymore. But it is really refreshing that he is not stuck to the boring Vs. His used the proliferation of huge globally integrated bio databases as example. Instead of the Vs, he used a very practical classification of big data as broad (small amounts of data records, with a huge number of observations, i.e attributes or dimensions), deep (large amounts of data, but with a small number of observations) and rich (which is both broad and deep).
The second aspect I personally would have liked to hear more about was which Big Data technology, if any, was utilised in each case and how it satisfied the particular requirement. Some presenters mentioned the technology that was used in passing, but no-one gave an assessment of its usefulness (well, anyway, not in the sessions that I attended.) But sure, there was always coffee time to find out those details… I still stand by the statement that in many cases big data analytics can be performed with the intelligent application and utilisation of mainstream BI toolsets – it would have been interesting to see how that played out if it w highlighted or surveyed at this event.
Of course there was some very good material on the application of Big Data as well. For example, Dr Rob Fassett of Oracle, who focused on the application of Big Data in Personal Medicine, made two very important points. The first was that with Big Data we have to be more precise in the healthcare context. In many big data analytics applications, accuracy is sacrificed for expediency, with the accuracy being compensated for by the large volume of observations. However, when it gets to analysing big data to affect personal patient outcomes, this philosophy cannot be applied. It has to be more exact. The second point was that the biggest impact is at the point of care, or directly with the patient. If the results of big data analytics are not surfaced and applied at those points, the impact it can possibly make will be lost.
Geospatial Analysis
Geospatial analysis is one of the applications of big data analytics that for me holds a lot of promise. Sharon Kosmina of Rural Health Workforce Australia showed the results of applying an Analytical and Mapping Tool for workforce planning and forecasting. As she reiterated, “in rural and remote, [the] Geography [dimension] matters.” Theirs is a very challenging example of how population, socio-economy, health status, environment, providers and patient data is integrated, across data catchment areas where the boundaries do and do not overlap.
I speak about privacy below, but one of the interesting challenges they have is not to compromise the identity of providers and especially patients when working with spatial data in the rural context. In some cases the distances are so remote, and the individual observations so sparse, that individuals can actually be identified.
Data integration
Healthcare must be one of the industries with the most intriguing data integration challenges that I have ever come across. On one day there were as many as three sessions on how patient linkages were being done by three different organisations, with challenges of keeping the datasets anonymous, identifying families and still providing accurate enough data. Of course the type of linkage depends greatly on what the data is going to be used for. For researching trends over thousands of people and organisations, non-deterministic linking may be fine (up to a point of course), but when it comes to seeing a holistic view of a patient’s history over multiple care givers you need much more accurate identification and linkage.
Then we’re not even talking about linking the single eHealth record (PCEHR) across states yet. At this early stage of adoption, this integrated collective of the patient’s ongoing record of health interventions, which gives a much deeper picture of the patient journey through the healthcare system, already stands at 9 petabytes of data.
I’m sure it gets done, but I heard and saw very little about linking claims, payment and other financial data with clinical data from caregivers and other providers. Fraud may be less important than savings people’s lives, but I’m sure you’ll get some interesting findings when you compare claims data with actual provider-generated clinical data. This type of linkage is also very useful for monitoring treatment adherence.
Privacy and governance
Many of the breakout sessions and even some of the keynote sessions were devoted to privacy. Privacy of and shared access to healthcare information are probably the most sensitive topics in the entire healthcare field. This was evident from people’s varying responses as to who may see, access or manage their health data – which changed drastically if it could be used to save their own or somebody else’s lives… As long as it wasn’t the government who captured and managed it!
Privacy is also one of the most complex topics in healthcare. As Emma Hossack of Extensia stated: “privacy is multi-functional, and multi-layered”. Simple schemes of opt-in and consent are just not adequate when it comes to data ownership and sharing for the common good. Consent models in general are too cumbersome. She further elaborated on the OECD Principles and the concept of “Privacy by Design” which are both good topics to research if you are concerned about privacy issues.
For those worried about the privacy of their data in the eHealth record (PCEHR), it is all documented in the PCEHR Privacy Act of 2012.
Of course when it gets to Big Data, the whole field of governance, including data quality and privacy, just explodes with a burst of complexity. Organisations are already battling to govern internal structured data, not to even mention the complications and implications when trying to govern unstructured and external data. There were some good papers that gave practical advice on how to approach this. For example, Greg Taylor of NTF made the comment that “cleaning data is a source of insight.” The whole data quality process can be used to drive out additional insights that can be analysed.
Advanced Analytics
In my next life I want to be a healthcare data scientist, in fact, a healthcare analytical modeller in a research institution’s data science team. For me personally, the types of analytics and the outputs generated by the models created in this area are so interesting! And in so many cases the real life applications of the outcomes are so profound.
Many of the presentations showed where lives were saved or radically improved due to big data analytical outcomes. If we consider that researches are only starting to scratch the surface with minimal sets of genome data, imagine what all can still be modelled!
Concluding remarks
One of the keynotes that stood out for me was Prof Fiona Stanley of the University of Western Australia’s spirited presentation titled “from Data to Wisdom: Total population data for health and well-being.” In a highly energised talk she showed how big data, integration, Geospatial and other forms of analyses, together with advanced research collaborations were used in Western Australia to improve health and well being aspects state-wide. Truly inspirational!
As the sentiment in this blog no doubt shows, I thoroughly enjoyed the HISA Big Data conference. For two whole days I could soak up the good work and interesting applications being done in this interesting and challenging industry.
Like a bad virus, I’ll be back!