Using an Analytics Platform for Hospital Research


Using an analytics platform for hospital researchApplying complex analytics on a very wide and rich dataset data is fundamental to deriving insights when researching new approaches to specialised care, optimising the effectiveness of clinical trials, reducing the lead time from research to clinical practice – all with the objective of improving quality and longevity of life. However, identifying, accessing, managing and analysing such as vast dataset is a complex challenge that must not be underestimated. Fortunately, a modern analytics platform can play a significant role in enabling researchers and clinicians to perform their research analytics much more effectively.

In previous posts I discussed the components of a modern analytics platform,  and how to utilise such a platform throughout the analytics lifecycle. In this post I illustrate the value that such a platform can add to research personnel at a hospital.

Researchers and Clinicians

An integrated analytical platform provides researchers, clinicians and other business users with a logically integrated view of data, where the data may be from local hospital systems (e.g. from the patient administration system, pathology, radiology, etc.), from remote sources (such as public data sets like census figures, geographic codes and privately subscribed and publically available research datasets). The data may be physically persisted externally or internally in the platform’s storage capabilities, but users are not concerned with those details – they only see an inventory of all the data available and how the data items are related to each other. This integrated view of a very wide and rich data resource enables researchers and clinicians to do the following:

  • Search for related data items across the entire ecosystem, using natural language phrases, keywords, fuzzy matches, recommendations and other complex combinations of conditions.
  • Identify and group related data together, such as identifying patient cohorts with similar characteristics, symptoms, diagnoses or treatment plans, including through applying more advanced algorithms such as profiling and segmentation. These groups can then be managed separately or together, for example as a treatment and control group through an extensive research study. The flexibility of the platform, together with its metadata and ad hoc data capture facility allows researchers to tag these patients and trace them through an experiment.
  • Analyse and present the entire patient journey through its various interactions with the hospital, including consultations, diagnoses, in-patient and emergency presentations, radiology treatments, pathology tests and results, pharmaceutical prescriptions, etc., with rich sets of related data points at the various points of interaction. Being able to analyse unstructured text (such as clinician’s notes), imagery (e.g. x-rays and photographs), video and other rich data sources allows clinicians to build up a very rich representation of the patient journey.
  • Formatting, present and report information (conventional business intelligence), discover, analyse and present data graphically (conventional data visualisation), but across the much richer and well-catalogued data resource.
  • Collaborate around data, insights and all forms of data presentation, for example to discuss treatment plans, tests, outcomes and insights with researchers at a remote faculty, potentially in another state or time zone.
  • Add or derive new data, calculate aggregations, and apply specialised business rules, without altering the original data. For example researchers can add additional hierarchies, either temporarily or as a new permanent dataset, add new lookup data from other research studies, add additional lookup codes and links (e.g. to temporarily relate ICD codes to DRG codes) and so on. This is very useful for comparative and what-if analyses.

Using a conventional BI system, all the data would first have to be transferred from various data sources – which by itself may take months to implement, test and validate, then restructured, contextualised and tagged in the data warehouse before functions such as search, tagging and tracing would be possible – and then the searches would have to be coded or configured to explicitly access all the relevant tables and columns.

A second important point is the active metadata management – all of the above takes place while being documented, catalogued and actively driven from the tightly integrated and all-encompassing metadata management facility. In other words, the metadata management facility is actively used to define and drive all the functions – it is not an inactive after-the-fact documentation tool. As a result, the organisation always has an up-to-date catalogue of its data inventory – both on-site as well as off premise.

Data Scientists and Analysts

The analytical platform’s data mining, machine learning, advanced analytics and cognitive computing components will enable more advanced users like data analysts and data scientists to do, amongst other things, the following:

  • Use the rich data sets linked to the patient journey to identify likely indicators, predict potential outcomes of suggested treatments (statistically), predict patients likely to show certain symptoms or predict patients most likely to be affected by identified events (such as falls, infections, adverse reaction to treatments and so on).
  • Examine large volumes of research publications using textual disambiguation and other advanced text mining approaches. For example, it has been stated that in the field of Oncology so much research evidence is being published that it will take a researcher up to 160 hours a week to keep up with the literature1. An analytical platform with access to the vast body of published research material will enable researchers to intelligently cherry pick information particularly applicable to their own research, without the pressures on IT to acquire, manage and govern such a vast and ever-changing pool of data.
  • Examine and interact with large volumes of research data using advanced statistics-driven search algorithms. An analytical platform with access to external pools of research data gives researchers the capability to intuitively search through those externally hosted datasets and link to the pieces of data that are relevant to their investigations, again without burdening IT to obtain and manage those external datasets.
  • Use cognitive computing facilities such as natural language processing to trawl through large volumes of unstructured text to obtain relevant information and insights. Even with the uptake of electronic patient records and data integration across multiple systems, there is still a vast pool of valuable information contained in unstructured text, such as in clinicians’ notes, laboratory reports and of course published articles.

There are two economies of scale at play here. The first is the analytical platform’s capability to access external data sources and link those to locally hosted datasets – and through that provide a seamless interface to the entire data pool for researchers and clinicians. In a conventional business intelligence implementation, it would take a large technical team literally years to on-board all of that data in a data warehouse, and in the process – due to the potentially limiting view that pre-designed dimensional data models impose on the data – some insights may be left hidden in the data. Prematurely transforming and modelling all data into a dimensional data warehouse (i.e. star schema) is limiting in that some as yet unforeseen requirements may not be met. In many cases, many of the predefined transformations are unnecessary work as those star schemas never get used. Secondly, the advanced capabilities of the analytical platform to deal with unstructured text, allows it to search through, access and link data from a vast array of unstructured data sources, including published journals, which in a structured data warehouse would take a lot of designing, searching and transforming to uncover.

IT Staff

On a technical level, the integrated analytics platform empowers IT staff to manage the platform at a holistic level, instead of the individual data feeds, data stores, reports and so on. The platform provides the following functionality to data managers, IT staff and infrastructure personnel:

  • Transparent access to local and remote data through definitional facilities.
  • Integration of related data sets across a wide variety of conditions and business rules, again through definitional facilities. Data access, data movements and data linkages are all defined on a logical level, and are implemented by the platform without requiring extensive coding as in a conventional BI platform.
  • Data is only moved, copied and/or transformed when absolutely necessary for performance, non-intrusion, security, specific functionality (such as advanced analytics or machine learning) or other well-justified reasons. Through the data federation facilities which access data in situ, a lot of the requirements to move data through extensive ETL processes (that have to be designed, coded, implemented and tested) are eliminated.
  • An analytics platform, with its integrated federation data framework both enables and encourages the employment of agile delivery approaches that in turn simplifies the complex issues and challenges faced with remote, unstructured, unknown and new data, also eliminating large amounts of the traditional ETL development and implementation efforts.
  • Through the metadata management and user interface capabilities, the analytics platform provides rich security and privacy management capabilities, including role- and value-based security controls.
  • Most analytics platforms include their own disaster recovery capabilities, where in a conventional BI implementation those have to be set up, configured, managed and checked.

Concluding Remarks

The combination of the powerful functionalities offered by the analytical platform (such as search, federated data access, advanced analytical modelling, transparent data movement and storage) together with its metadata-driven definitional and cataloguing capabilities, enable researchers and clinicians to identify, obtain and analyse data in their research efforts at scale and level of intensity  not previously possible with conventional BI systems.


Susan Walker, CEO Health at entitythree, for insights into clinical research at hospitals.


1)       Memorial Sloan-Kettering Cancer Centre – IBM Watson Case Study

Leave a Reply