Unpacking the top data trends for 2023

Share

As the year rushes to an end, it is interesting to see what the analysts are indicating as the top expected trends for next year. With that in mind, I recently took a deeper look into two independent trends focused papers I came across – ‘The Top 5 Data Science And Analytics Trends In 2023’ by Bernard Marr published on Forbes and ‘Top 10 Future Data Analytics Trends in 2023’ by Sonia Mathias in Data Science Central published on TechTarget’s site – to see which were the common trends they both expect to rise in 2023.

Cloud and Data-as-a-Service (DaaS)

Marr grouped these two topics together, while Mathias kept them apart. However, she also had five more topics in her list.

Regardless, it is becoming increasingly apparent for small to medium sized businesses that having their data in the cloud is the way to go. And I couldn’t agree more – it just makes so much sense when you do not have to manage on-premises systems or even have system and DBA experts on the payroll to manage these systems. The cloud is also an operational expense, as opposed to a capital expense, which is more tax efficient.

Add to that the rich set of functionalities you can now get from cloud platforms, such as self-service data analytics, subscription-based access to third party data, advanced analytics services, and so forth, which to quote Marr: “allows businesses to work with data without needing to set up and maintain expensive and specialised data science operations.”

Advanced forms of analytics

I have grouped a wide, but related set of technologies together here. These featured strongly in both papers and included artificial intelligence (AI), machine learning (ML), natural language processing (NLP), and data analytics automation.

As is currently the trend, the use of these technologies will steadily increase as organisations realise the kind of insights they can achieve, thereby gaining good business value. I have purposely used conservative phrasing here as I do not think we will see massive ‘hype-driven jumps’ in the adoption curves of these technologies, but rather steady growth.

The one area that excites me is the ‘productionisation’ of analytics, including analytics automation. Here we will see an increase in the adoption of analytics to operationally manage business processes. This is a great move to making analytics more useful on a day-to-day operational business and is a good step away from the ‘slightly academic,’ ad hoc, and independent advanced analytics project that was often done in isolation where nobody knew how to integrate the outcomes into the business.

Another area that I get excited about is Natural Language Processing (NLP). It is almost 2023 and we are still hammering away at our keyboards. There must be a better, faster, and more natural way. Granted, it does become a bit eerie and intrusive when you talk about something and the next minute, without even having searched for it, the adverts start popping up on your phone!

Data governance

I must say, this is one that surprised me. Do not get me wrong, I am a big advocate of data governance. It needs to be stronger and more intrenched in business and data-related processes, especially if businesses are going to make more informed and more far-reaching decisions based on insights derived from their data. I just did not expect it in the top trends for 2023.

I anticipate that with the ongoing increase in data privacy breaches there will be a stronger emphasis on data privacy with most countries governed by strict laws and regulations. This is hardly surprising given that others are trying to monetise data and trying to make a large amount of money by selling data that was never put up for sale to begin with. It will be interesting to see how these trends track through the new year. One thing is sure – we will not be bored!

Soft skills key differentiator of good data analytics professionals

Share

The technical skills required to be a data analytics professional form the cornerstone of training approaches. If these individuals are not proficient in things like data wrangling, analytical approaches, modelling techniques, data exploration, or data visualisation, then they will likely not qualify. But that is just one part of what the job requires. I believe the difference between a good and an average data analytics professional can be found in the soft skills they possess.

Read the rest of this entry »

Understanding the essential KPIs for data and analytics

Share

To be successful, any organisation must be able to measure what it manages. This is especially true when it comes to data – easily one of the most significant assets any digitally driven business has access to. I recently came across this Gartner report that examines the five data and analytics (D&A) KPIs that every executive should track. I found the information greatly insightful and so wanted to take the opportunity to add some of my views.

Read the rest of this entry »

Can an organisation be data-centric without a CDO?

Share

Regular readers of my blog will no doubt know that due to my immersion in the data-information-insights spectrum, I’ve been very interested in data centricity and the role of the Chief Data Officer (CDO) for some time. On reading a recent industry piece about the role of the CDO, a question that came to my mind is this: can an organisation be truly data-centric without having an official CDO role in place?

Read the rest of this entry »

The importance of analytics in the data governance process

Share

When it comes to data, one of the areas that really interests me is the connection between data analytics and data governance. There is obviously a lot of material available on how data governance can assist organisations achieve better data analytics outcomes, however, a paragraph focusing on analytics-enabled data governance in this Precisely.com article ‘The Intersection of Data Analytics and Data Governance’ piqued my curiosity.

Read the rest of this entry »

Finding balance between data science and data engineering

Share

Previously, I wrote about the two-tiered data lakehouse with an analytical sandbox and a curated data warehouse in the second layer – the one for productised BI and the other for data science work. I was therefore intrigued when coming across a Forbes article titled ‘Three Keys to a Harmonious Relationship between Data Science and Data Engineering’ that shows exactly the kind of balance we are trying to get right in my current engagement.

Read the rest of this entry »

Finding value in the data lakehouse

Share

As you may have gathered from my previous post, I have become very interested in cloud-based data lakehouses. It was therefore with a keen interest that I read the article ‘Five Effective Ways to Build a Robust Data Lake Architecture’ on Enterprise Talk.

While initially I was hoping that the author was going to review and compare five different variations of data lakehouse architecture, the piece focuses on five steps needed to set up a data lake architecture, and the insights shared are very useful. The five steps identified are:

  • Determine the business data goals
  • Select the right data repository to gather and store information
  • Develop a data governance strategy
  • Integrate AI and Automation
  • Integrate DataOps

Of course, these steps are all crucial to success. I cannot emphasise enough the importance of aligning the business strategy and goals in addition to a well implemented and enforced data governance strategy. However, I found that my need for more information saw me clicking through on another article on Enterprise Talk, titled ‘How Enterprises Can Leverage Data Lakehouse Architecture to Get the Most Value from Their Data’.

Defining a data lakehouse

Previously, I have been involved on the periphery of a data vault-based data lake. This works well for a smaller number of disparate data sources. But in my current engagement, we must integrate and standardise data across many different data sources, within minimal time frames using limited resources. In effect, we cannot cater for the extensive data modelling required for a data vault.

This would also explain my interest in data lakehouses. The article defines a data lakehouse as ‘a dual-layered architecture, with a warehouse layer placed over a data lake, enforcing schema, which ensures data integrity and control while also allowing for faster BI and reporting. Data lakehouse architecture also eliminates the need for multiple data copies and drastically decreases data drift issues.’

Speed above all

In my attempt to simplify things, I see the data lake part as a continuously updated, timestamped source-true staging area. This may be schema-defined for structured data and schema-less for unstructured data. What I like about such an approach is that very little processing is done to the data on the way in. It is quick and efficient, and source-true for auditability and traceability.

In fact, in some implementations the data is not even physically copied to the data lake. Instead, through virtualisation, it can be viewed as if it were in the data lake. It makes perfect sense for large datasets that already contain date and time indicating attributes, and which are not accessed and updated much on the source system. Examples can include system logs, activity logs, audit trails, mobile call records, point-of-sale transactions, and so on.

It’s not inside…

For me, the ideal solution would have two constructs sitting ‘on top’ of the data lake as opposed to only the data warehouse as defined above.

The first construct would be a minimalistic structured data warehouse. However, I would add the criteria that it only contains the measures and dimensions and timespan of data used for regular BI, dashboarding, and reporting. In other words, very highly curated, controlled, and trusted data is used to produce the information in the daily, weekly, and monthly running of the business. While I am at it, we might as well throw regulatory reporting in here as well.

The second construct would then be an analytical sandbox that is used for analytical models and ad hoc queries that need data integrated or refined from the data lake. Models and reports developed in the sandbox can always be productionised into the data warehouse component if they become a long-term fixture on the information/intelligence landscape.

As the author of the second article writes, ‘a data lakehouse initiative, when done correctly, can free up data and let an organisation use it the way it wants and at the speed it wants’. And that is ultimately what we are all looking for.

Rethinking the opportunities bubbling below the surface of data lakes

Share

Long-time readers of my blog can likely recall my scepticism around data lakes when they first emerged. However, a lot of water has flowed into the space since then, motivating me to start investigating the area in-depth.

My early criticisms against Hadoop-based data lakes were that they were too batch-oriented to cater for business analytics and business intelligence requirements. The technology did not cater for cataloguing its contents. Instead, it required users to manage the catalogue or dictionary to the side if they were to keep tabs on what was being pumped into the data lake. This left a major burden of data management on data engineers and data custodians. They were tasked to manage the organisation’s data resource properly and to make efficient access and utilisation of the data possible.

Invariably, this required a lot of work and discipline to avoid the data lake becoming an unmanageable data swamp. Fortunately, much development has been happening to overcome this challenge as is evident in this article published in Virtualization and Cloud Review.

Moving beyond past obstacles

The convergence of data lakes and data warehouses, as deployed on cloud technologies, address my early concerns around still having to develop a full-scale data warehouse downstream from the data lake to make sense of the data dumped into the lake.

Modern cloud architectures are more tightly integrated and the architectural distinction between the data lake and the data warehouse is fading away. As a recent TDWI article puts it: “The new generation of DWs are, in fact, DLs that are designed, first and foremost, to govern the cleansed, consolidated, and sanctioned data used to build and train machine learning models.”

The data structures used in these technologies have also become more efficient and well-managed. In some cases, the data from the data lake does not even have to be physically moved to the data warehouse. Techniques like virtualisation and data mapping mean logical structures in the data warehouse can directly point to the appropriate physical data sets contained in the lake. This eliminates a lot of data movement and redundant data copying. In turn, this results in significant savings in storage space and processing while increasing access to data and readiness.

The move towards integration

These developments are resulting in data architects and solution designers embracing new, integrated approaches. As Tomas Hazel puts it in his article: “It is important to find a solution that allows you to turn up the heat in the data lake with a platform that is cost-effective, elastically scalable, fast, and easily accessible. A winning solution allows business analysts to query all the data in the data lake using the BI tools they know and love, without any data movement, transformation, or governance risk.”

Having started to study some of the technologies now available in this space, I am excited by the potential of these platforms that now allow and enable good data management practices, such as the data governance aspects Hazel pointed out. My initial investigations also revealed to me that with less physical data movement, and more configurable and less coding-based data mapping and data transformation, this environment will be much more scalable, cost-effective, and easily manageable.

Data optimisation

Krishna Subramanian on RTinsights explains approaches of how to get file-based data into a managed cloud-based data lake. While her article is more focused on the taming of the data in the data lake, she does touch on several approaches.

These are equally applicable to incorporating and managing structured source system data as well as unstructured data into the integrated cloud-based data lake or data warehouse environment. This includes optimising the data through proper metadata application and tagging and indexing to enable more efficient searching. With this comes appropriate relevance filtering on the flood of external data and being able to use appropriate taxonomies.

She does not just focus on the technology side of the data lake. Her considerations end by addressing the organisational culture. She refers to 2021 research by New Vantage Partners: “Leading IT organisations continue to identify culture – people, process, organization, change management – as the biggest impediment to becoming data-driven organisations.”

She continues that a data-driven culture needs to span not just the analysts and the lines of business, but IT infrastructure teams too. From my side, I would add that this data-driven culture must include representatives from each line of business – the data stewards and business subject matter experts. I will no doubt explore this in more depth in the future.

The advantages of data fabrics

Share

In my blog post last month, I started looking at the concept of data fabrics to get an understanding around what it is all about. This month, I continue with the discussion, focussing on the advantages of data fabrics. The points I have outlined below are based on a very good article written by Lori Witzel, Director of Research for Analytics and Data Management at TIBCO and published on ITProPortal. I have added my own views here and based on my experience working in the data space.

To recap, the concept of data fabric was created to address the need for more data-driven insights while coping with the reality of the distributed nature of modern data architectures. Complicating this is that most organisations are dealing with data sources located on-premises and across hybrid and multi-cloud environments. For example, a company might be running both a CRM application and a modern data warehouse platform across two different cloud providers. A correctly implemented data fabric framework enables us to work across all these data environments.

However, it is not only about the technical connectivity and the integration of data flows, data access, and data storage. There is also an element of augmented data management and data governance that is required in such a cross-platform orchestration.

The challenge is that in the modern disparate, and often siloed organisation, a lot of de-duplication, verification, integration, and other data resolutions are required to get a complete and single source of the truth. Add to that the vast amount of ‘old’ data. While this might not operationally be required anymore, the company may need this data archived for trend analysis, historic reporting, or even as mandated by legislation.

The referenced article lists the four key advantages of using data fabrics as insights, innovation, information governance, and insured trustworthiness.

Insights

According to the author, data fabrics allow an organisation to treat its data like any other business component, something advocates have been crying out for years. Data fabrics not only allows the organisation to take advantage of more advanced insights, such as those derived through analytics and machine learning, but it also enables the data custodians to automate and accelerate data management. I would love to explore how this will work in a future post, so keep an eye out for that as well.

Innovation

The second point almost happens naturally. An organisation that leverages a data fabric approach can put their entire ‘data estate’ (as the article refers to it) to work. With so much more actionable insight being generated as a result, it is easier to transform the organisation through data-driven insights and attempt and adopt new levels of innovation. Of course, the organisation’s culture and agility must be able to support such an approach.

Information governance

This advantage sounds like music to my ears – because as our data environments become more complex, so does the data governance aspect too; coupled with, as the article states, an increasingly complex array of regulatory and compliance needs, not to mention more stringent requirements for privacy and security. The unified view provided through data fabric frameworks can simplify and streamline this complexity. I am eager to explore this in more detail too in subsequent posts – how exactly does a data governance forum utilise the facilities and features of the data fabrics framework to improve data governance over a faster flowing and more tightly integrated architecture?

Insurance

The name of this advantage was slightly strange for me as this point is about the trust in data – which of course is critical. I guess the author wanted to stay with the four ‘I’s theme. So, the data fabric provides improved trustworthiness in the data, and what can then be done with it. This flows directly from a more unified approach to data management. It will be interesting to see how this is physically implemented. As we all know, the downstream data quality can only be built on the basis laid in the source systems. Looking ahead, I am interested in examining each of these four concepts and how they will be technically implemented within an organisation.

Unlock new business insights through data fabrics

Share

Seeing as I’m currently working at large for a federated organisation with significantly different and siloed business streams that are managed through a plethora of different systems – ranging from 30-year-old mainframes to modern in-cloud platforms – the topic of data fabrics is very interesting to me. Even more so given how I’m coming from a database, data governance, integration, business intelligence, and insights background.

Read the rest of this entry »

Older posts «

hope howell has twice the fun. Learn More Here anybunny videos