Data silos and how to break them down

Share

Data silos are pervasive in organisations around the world. At many of the companies where I have worked, these data silos have made it difficult to ensure system integration, data governance, and effective reporting. Following an insightful industry piece I recently came across, for my blog article this month, I decided to focus on and discuss some of the approaches that can be used to break down these walls.

As mentioned, a great industry article to read about this topic is Bob Violino’s “Breaking down data silos for digital success” published on CIO. In it, he uses two key phrases – unifying data strategies and knocking down data (and political) walls.

This highlights that breaking down data silos at an organisation is a strategic imperative. Trying to do so by stealth or adopting a bottom-up approach will be impossible. Furthermore, data silos are often the result of the corporate’s political landscape and make-up of the business. While some of these are a result of the legacy of older organisational cultures, modern businesses looking for integrated and consolidated insights must move beyond those ‘limitations’.

In my experience, healthcare is one of the industries that struggles the most with siloed, point-focused, and unintegrated systems. In his article, Violino uses a children’s hospital in the US as an example of how to overcome these data siloes. The case study examines the hospital’s journey to consolidate 120 separate systems into a single, centralised data warehouse with one reporting tool.

The value in data lakehouse architecture

More organisations are embracing data lakehouse architectures, as opposed to conventional dimensional data warehouses. These are used to systematically collect all relevant structured, unstructured, and streaming data, to store, transform, aggregate, and label it as needed. And finally, to optimise the data for reporting.

At the organisation I am currently at, the data lakehouse architecture is enabling us to ingest data from a myriad of systems much faster and with more agility to adapt to changing requirements. So, instead of an instantiated dimensional data warehouse on top of the data lakehouse, we are using dynamic views and the reporting tool’s semantic modelling capabilities.

This results in putting in place more efficient and business-friendly reporting environments. Not only do these return results faster, but also use fewer resources than it would take to design and implement a dimensional data warehouse and the multi-layer ETL processes to populate it. Of course, in our reporting models, we are still using dimensional principles, but we are not physically instantiating them.

The data steering team

Organisations can make additional strides in breaking down silos by putting in place a dedicated data navigation or steering team. Such a team would help the organisation align data across business areas and establish a data governance function. This will empower decision-makers to ensure the trust, privacy, and security of data while also being able to identify the technology and human resources to use to help build an integrated data architecture.

Such an approach would be especially beneficial to a highly siloed organisation. Having key stakeholders from the various silos participate in a central forum, making key data- and priority decisions, will foster a culture of sharing across organisational boundaries. When key data- and governance-related decisions are shared, and when information about data quality and business gains achieved through centralised reporting is shared, it tends to organically break down the siloes. This approach favours treating data as a shared resource that must be managed accordingly.

Data silos can result in inconsistencies and operational inefficiencies for a business, and their dissolution can ensure the consistency and accessibility of reliable data across the organisation. A centralised data team structure can establish a unified data ecosystem. Breaking down these silos can foster a culture of innovation, facilitating coordination and collaboration between different business areas. This will result in better decision-making, efficiency in providing analytics, and faster service to stakeholders.

Next month, I will be digging deeper into the strategic aspects and other tips on how to break down data silos.

Understanding the role of the CDO

Share

Given the growing adoption of artificial intelligence (AI), and companies across industry sectors wanting to become truly data-driven, having a Chief Data Officer (CDO) becomes non-negotiable. However, not every organisation truly understands what this role encompasses.

Read the rest of this entry »

Self-service BI – Examining the right approach to take

Share

Last month, I discussed the challenges and opportunities of self-service BI as an approach to enable non-tech-savvy business users to directly access data and explore it on their own. In this blog, the focus turns to some of the reasons why self-service has failed and understanding the approaches that have worked.

For reference, I use the terms ‘power users’ and ‘citizen developers’ interchangeably in this piece, to represent those business users outside the BI team. These are the people who will be enabled through self-service BI to address their own and others’ informational needs.

Self-service BI failures

There are several possible reasons why self-service BI fails. Some that have been identified are:

  • Unrealistic expectations: Companies who let novice users loose on organisational data face the potential of bad quality reports and inconsistent reporting. This results in a huge distrust of data in general.
  • Reporting chaos: With no governance structures in place, there will be redundant reports from different users. Because they work in silos and use different filters and terminology, they will deliver conflicting results despite using the same underlying data.
  • Lack of adoption: BI tools, environments, and processes may be easy for specialists. However, we must keep in mind that casual users do not have the same background and skills. The complexity and ‘newness’ of these environments can be quite intimidating for novices.
  • Lack of support: Citizen developers are not trained on BI processes and tools. Without proper support and handholding, any self-service initiative is bound to fail. Organisations must factor in the time and resources essential to deliver this support.
  • Poor data quality: If the power and downstream users do not trust the data, they will stop using it. Even worse, if the business starts distrusting the data there is a high likelihood that siloed pockets of departmental BI initiatives will get started.

Making self-service BI work

Through a combination of my own experiences and several industry sources, below are a few ways a business can go about establishing self-service BI successfully:

  • Identify the user population: It would be complete chaos if self-service BI was accessible to the entire company. In a data-mature organisation, only 25% of all users can be labelled as potential ‘power users’. In my experience, it is better to bring these ‘citizen developers’ on board in small groups, each with a specific focus and guidance. It is also useful to involve them in a community of interest.
  • Set a self-service BI strategy: Self-service can mean a lot of different things. The business must therefore be clear about the scale of implementation, the types of users, their technical proficiency, the expectations of deliverables, and the approach to be used. It is also important to not try and boil the ocean! Starting small and building focused business areas one at a time works well.
  • Keep stakeholders informed: The company must keep not only the power users but also their managers and the intended users of their work up to date. There must also be channels set up for feedback throughout the process.
  • Set up comprehensive governance and quality assurance: In my mind, this is one of the most important aspects to get right. A company must put policies and processes in place to ensure what is delivered to the business is complete, accurate, timely, and relevant. A business cannot allow inaccurate or inconsistent information to be reported to the business. Once data is distrusted, it becomes an almost insurmountable obstacle to regain that trust. To this end, peer review processes are very useful.
  • Use an appropriate tool: Although most of the reporting tools out there claim to support self-service BI, some are more suitable than others. I have found that tools that support curated semantic models are better for ensuring consistent and accurate reporting. Additionally, they reduce some of the technical complexities where users have to identify and implement various types of joins and unions across datasets.
  • Establish a single source of the truth: Even a well-architected data warehouse and reporting environment may be too complex and detailed for citizen developers. A well-curated and quality-assured semantic data model, with pre-joined and de-normalised high-level entities presented in business-language terms works very well. This requires a lot of planning, designing and implementation before letting power users loose on the model.
  • Establish a dictionary and metadata: Not only must the data be surfaced in business terms, but it should also be properly catalogued and documented. The documentation must be readily available and easily accessible by the business users. Likewise, there should also be a catalogue of existing reports and dashboards. These must also be easily accessible to enable users to search the catalogue for similar reports before embarking on a new development.
  • Educate the power users: Users must be educated on the use of the tool as well as the data. Aspects of visualisation theory are also very important. It is also very useful to explain the data model to power users through workshops and hands-on implementation sessions.
  • Refice and adapt: Like any good strategy implementation, regular monitoring, review, feedback, and adjustment are always useful.

While I did not discuss it in length, there must also be a close alignment between self-service BI and the broader data governance function. Self-service BI users often detect data quality and consistency issues. This means there should be good and open communication to bring these issues to the fore and make the organisation aware of how they are being handled.

Understanding the benefits and challenges of self-service BI

Share

The concept of ‘self-service business intelligence (BI)’ started gaining momentum in the early 2000s. More than two decades later, a survey by Yellowfin has found that the majority of respondents (61%) say that less than 20% of their business users have access to self-service BI tools. Perhaps more concerning, 58% of those surveyed said less than 20% of people who do have access to self-service BI use the tool. In this blog, the first of a two-part series on the topic of self-service BI, I take a closer look into this interesting and challenging area in the wider data field and share my views.

Read the rest of this entry »

Data and Analytics in Healthcare – conference review

Share

I recently had the privilege to attend the Data and Analytics in Healthcare conference hosted by Corinium Intelligence in Australia. Following this, I wanted to use my blog piece this month to discuss some of the key lessons and insights shared during the event. I’ll jump straight in.

Data governance and ethics

Of course, it is easy to get carried away by the hype around generative AI, machine learning (ML), large language models (LLMS), and so on. Even so, it was sobering to see several presentations and panel discussions still focusing on data governance and ethics related to these topics. But given how we are talking about healthcare data this should not come as too much of a surprise.

A useful framework in this area is the Australian Digital Health Capability Framework and Quality in Data. This looks to align with existing industry-specific frameworks, ensuring that all health and care workers are empowered with digital capabilities. Concern was expressed that as much as 60% of AI and ML tools, especially cloud-based ones, share healthcare data with third parties without consent.

Additionally, delegates heard that data governance and adherence to ethics do slow the adoption of insights. There were examples where research results were not implemented for years due to the number of frameworks, data governance, privacy, and other controls that had to be followed. It was mentioned that legislation is often a step behind ethics, resulting in the need to reword policies down the line.

On the positive side, the sharing of ‘de-identified’ health data for research and outcome improvement was unanimously supported. One of the presenters compared this to sharing an organ for transplant. Why would anyone not want their de-identified data to be shared if it can improve the health outcomes of others facing similar circumstances?

Data management

Another area that was covered, which is close to my heart, was that of data sourcing and data management. It was stressed that to obtain advanced insights from data, it must be the right data, of high quality, and available in a processable format. The amount of unstructured data that is hard to mine and interpret in healthcare is staggering. In short, you need a solid data foundation if you want to use AI and ML effectively. This comes down to having data that is scalable, understandable, accessible, and fit-for-purpose.

One of the biggest challenges in healthcare data remains data linkage – joining the dots between related data in different datasets, originating from different systems often managed by different organisations. An interesting observation made was around the bias in healthcare – where we mostly collect data about sick or ill people. In fact, hardly any data is collected about healthy people within this context. This makes it difficult to identify what target populations for treatments, or comparative control groups’ variables, should look like. Making this more difficult is the fact that a lot of data is collected and stored but not used to its full potential.

Process and methodology

There were several good sessions centred on the processes and methodology to follow when adopting analytics, especially AI, ML, and LLMs in healthcare. While these were too detailed to cover here, the general sentiment was that one needs to be more careful and thorough about the design, evaluation, and interpretation of results. Especially in rare cases, do we have enough volumes of training data of sufficiently high enough quality for advanced models? Is the technology mature enough? And do we have proper processes for ongoing monitoring and improvement?

In other industries, people may get annoyed or even switch providers when an incorrect marketing campaign is fired off at them, or an inappropriate product is recommended. In healthcare, the implications of these ‘mistakes’ can have more serious implications, even life-threatening ones.

Building capacity and capability

Other sessions had interesting discussions on developments around capability and capacity building. My impression was that healthcare organisations in other countries are also scrambling for resources and funding. Key approaches to help overcome this include partnering, collaboration, and innovation across organisations and teams. The adage of start small and build on RIO shown came through as well.

Additionally, culture is key. It was mentioned that literacy and education take as much as 70% of the effort of adopting new technology and insights. The mind shift that must happen at the decision-making levels was also covered. Insights and data have to have a seat at the table.

An interesting study showed that both text analytics and AI only did an okay job of coding and classifying electronic medical records, with not a huge difference between the two. So, while it takes coding and classification experts hours to apply coding and classification to cases and diagnoses, there is a massive risk in replacing that expertise, insight, and interpretation with an automated process. You simply cannot automate the acquisition of health knowledge and interpretation.

The general message that came across was that AL and ML were efficient in reducing the administrative burden of clinicians and allied health staff. But despite some amazing (isolated) research outcomes, it was too risky and unethical to have technology make or influence diagnosis and treatment. However, healthcare is overloaded with administrative processes and many redundant data capture processes that can be automated to free up the clinicians and allied health staff to focus on what they are trained for and do best.

Conclusion

In closing, I didn’t review specific AI, ML or LLP case studies – they were very interesting and relevant and well presented, with lessons learnt, but it’s just too much detail to cover in this post.

It was another great and relevant event put on by the Corinium team. I walked away with many notes and some key aspects to incorporate in my strategic and operational plans going forward. I learnt about a few new concepts and made a few new connections too. I hope you find the above brief insights shared of value.

Of course, a nice venue and having proper barista-made coffee and wholesome food, together with networking drinks afterwards, rounded it off to make it an enjoyable experience. All in all it was a great and insightful day!

The importance of data lineage

Share

Do you know where your food comes from? Did the farmer use pesticides? Did the transport company spray preservative chemicals over your food? Did they keep it appropriately refrigerated? Would you eat food from sources you don’t trust? The same applies to data. Do you know what the lifecycle of your data entails? Was it manually entered? What validations were applied? Through how many transactional systems did it go, and was it transformed along the way? Would you make decisions based on data you don’t trust? This is where data lineage comes in.

Read the rest of this entry »

Data quality a priority for 2024

Share

Despite the hype surrounding generative Artificial Intelligence (GenAI), I am finding in my industry reading that many industry analysts are predicting that data quality (one of my favourite topics) will remain a key priority for this year – especially when it comes to data management and governance.

Read the rest of this entry »

Crystal-ball gazing for 2024

Share

We have approached that time of year when it’s always interesting to delve deeper into what the analysts see in their crystal balls when it comes to the trends and technologies to keep an eye on for the new year. I have reviewed several industry articles as resources around this, and let’s just say that if all these predictions come true, we will be in for quite a ride in 2024!

Below are just some of the ones related to BI and analytics that I found quite interesting and wanted to share with you.

Generative AI:

This is a term that everyone is very familiar with by now. In a recent Forbes piece, Bernard Marr writes that Generative AI is going to make a huge impact by taking care of most of our menial work. This includes ‘obtaining information, scheduling, managing compliance, organising ideas, structuring projects.’ Of course, he acknowledges that challenges remain around ethics and regulation that must still be solved.

In a Gartner review, Ava MacCartney reckons that ‘by 2026, generative AI will significantly alter 70% of the design and development effort for new web applications and mobile apps.’ While certainly plausible, I’d like to see what the figure is for BI and analytics. In the data sourcing and data engineering space, we are still doing a lot of manual labour that could be automated.

Imagine you can just say: “Get me the data from the CRM and the billing systems and integrate them on Customer ID!” and voila, there you have got data from 60 tables integrated and ready for analysis and to develop models on. “Now tell me which customers are about to churn and recommend a campaign that will entice them to stay.” Ah, we can dream. Gartner places Generative AI, together with Platform Engineering, AI-Augmented Development, Industry Cloud Platforms, Intelligent Applications, and Sustainable Technology under a banner called ‘Rise of the builders’. McCartney believes these technologies will boost the creativity of the communities involved in this type of work.

Developer experience (DevX)

In a sister Gartner paper, Lori Perry writes that ‘the suite of technologies under this theme focuses on attracting and retaining top engineering talent by supporting interactions between developers and the tools, platforms, processes, and people they work with.’ I am all for technologies that will make our data engineers’ work more pleasant. But while powerful, I wouldn’t call the user experience of data pipeline technologies enjoyable and highly productive yet. Perry cites the Value Stream Management Platform (VSMP) as an example of DevX technology that seeks to optimise end-to-end product delivery and improve business outcomes.

She also explores technologies like AI-augmented software engineering that can help software engineers create, deliver, and maintain applications. Furthermore, API-centric SaaS services could potentially be used as the primary method to access these technologies. There is also GitOps, which is a closed-loop control system for cloud-native applications, and other internal developer portals that enable self-service discovery that will increasingly come into the spotlight.

I’m looking forward to seeing these technologies in action to increase productivity and reduce human error in data management.

Responsible AI

In a review of the Gartner Data & Analytics Summit held in Sydney at the end of July, responsible AI emerged as a trend to watch. I like this positive spin on AI and Machine Learning as it covers many aspects of making positive business decisions and ethical choices when adopting AI. These include adding to business and societal value, reducing risk, and increasing trust, transparency, and accountability. Unfortunately, there are way too many case studies where AI and ML models have come up with ethically unsavoury or unusable insights.

Gartner predicts the concentration of pre-trained AI models among 1% of AI vendors by 2025 will make responsible AI a societal concern. The firm further recommends that ‘organisations adopt a risk-proportional approach to deliver AI value and take caution when applying solutions and models. Seek assurances from vendors to ensure they are managing their risk and compliance obligations, protecting organisations from potential financial loss, legal action and reputational damage.’

Data-centric AI

Another interesting topic in the same review is data-centric AI. This is more data-focussed than AI which is mostly based on models, algorithms, and code. Garner refers to data managed specifically for AI solutions. These include data synthesis and data labelling which are employed to solve data-related challenges, such as accessibility, volume, privacy, security, complexity, and scope. In my mind, this is not necessarily new technology, but rather a realisation by AI practitioners that there are aspects related to data governance that are equally important in the AI and ML fields.

What will be interesting is to see how the technology and practices are being adapted to function efficiently and effectively in more fast-moving and fast-changing environments. These environments are reliant on working with large volumes of data, and even data that was not sourced from within the organisation. There are some useful data governance and cataloguing platforms out there, but the challenge has always been to make them work productively at scale. I think it will be crucial for data governance systems to apply AI and ML themselves to function more effectively.

Of course, many technologies and trends also focus on privacy and security. While important, they have not been the focus of this post as I wanted to explore data-specific trends.

The uniqueness of modern data quality management

Share

Last month I addressed how data quality is perceived by different specialists inside the organisation. This month, I turn the spotlight onto what makes modern data quality management different from traditional approaches. Edwin Walker’s Data Science Central article ‘Difference between modern and traditional data quality’ provides an excellent starting point.

Read the rest of this entry »

Data quality is in the eye of the beholder

Share

The quality of the data we work with has a significant impact on the quality of the insights we can extrapolate for the business. Following on from my recent mini-series on the evolving data-related roles, it was interesting to come across Edwin Walker’s article on Data Science Central titled: ‘How do different personas in an organisation see data quality?’ Walker has also written on the topic of ‘modern data quality’ in another article available on the same site.

Read the rest of this entry »

Older posts «

hope howell has twice the fun. Learn More Here anybunny videos