So you want to do data science?


So your organisation wants to do data scienceBefore your organisation blindly jumps in and starts “doing” data science, there are a number of issues you should first figure out and get clarity and agreement on.

I have seen some weird stuff in the industry media, but this, posted in a recent article, must be the best to date: “’Data scientist’ has become one of the hottest new job titles. Indeed, it’s even been called ‘the sexiest job of the 21st century.’ That’s why more companies are planning to hire one or more.” Now if that is ever a reason why my organisation wanted to appoint anyone, I would get my CV up to date, and start running… Fast!

Fortunately the rest of the article was a lot more sensible. So, in this post I have taken the article’s “data scientist” questions as a departure point and adapted it to a discussion whether and how an organisation should embrace data science.


There needs to be a compelling business drive to make the organisation decide to embrace data science. It is definitely not enough to believe that a data scientist can help the organisation make sense out of big data (whatever that may be), or say that you need to develop and implement advanced analytical models. No, there needs to be a business case, with a potential positive ROI, where the outcomes delivered by said data science directly contributes to the bottom line – either in terms of increased revenue, reduced costs or streamlined operations. You must be able to articulate this rationale as a clear business case.

In hospital world, for example, working on reducing unplanned readmissions is a good example of the application of data science. You need to do some data preparation to transform administrative data to a form that represent patient spells and episodes accurately and richly enough. You need to understand the business meaning and implication of many of the variables involved. And when you have finally designed and evaluated an analytical model, you need to strategise how you can operationalise the outcomes to improve business as well as patient outcomes. If you get all that right, it has a very compelling business case – reducing patients’ returns to hospital, improving their health outcomes and in the process saving the hospital (and the health funds) some money too. In fact, a business case doesn’t get much better than that.

However, the business case will vary tremendously from organisation to the next. In insurance it is typically about increasing customer lifetime value, while in telecommunications and retail it is about revenue generation and increasing share of wallet. In many organisations there are opportunities for operational improvement and increases in efficiency, especially around logistics, where sensor and machine data can be analysed. In organisations focussed on marketing, data science can be used to improve promotion targets, for example through better segmentation and more targeted campaigns. Any area where data and analytics can be used to increase revenue, reduce costs or improve performance can potentially be included in the business case for “doing” data science.

Function definition

The organisation needs to clearly articulate what the data science function is responsible for, what business areas it covers, what roles it consists of, and where it fits into the organisation. In many organisations that alone is a lot to get formulated and articulated, never mind get agreement on.

The business areas that the data science function intends to address must obviously align closely to the business case. It doesn’t help we have a compelling business case for one area of the business, but we spread the data science team so thin across the other business areas that it doesn’t achieve anything significant in any of those areas. This scope will most likely change and expand over time, but you need to start with a solid peg in the ground.

The organisation needs to get clarity on what roles the data science function covers. The conventional definition is that a data scientist should have a background in advanced analytics (e.g., statistics, mathematics, predictive analytics and other general computer science concepts) with sufficient data/information management skills (e.g., data warehousing, data integration, ETL, SQL, IT infrastructure such as cloud-based systems, and even Hadoop and MapReduce) to enable him / her to get the data they need in the format they need it in. Some experience with data mining and visualization tools is also useful. He or she will spend considerable time discovering insights and then communicating those insights by telling a story with the data. Being a skilled communicator can also help navigate the likely cultural changes that will take place. However, even more important is an in-depth understanding of the business. So some experience as a BI business analyst is also very useful, or some time spent working in various business functions. But it doesn’t end there. The “bigger picture” understanding of business strategy is just as important to a data scientist as specific skills or the areas of expertise listed above. A data scientist should be able to shape a strategic vision for harnessing the power of data to affect or transform the business’ performance.

However, all that rolled into one person is quite a tall order. Although many academic and other training institutions are training up and releasing data scientists into the market, my conviction is that it will still be quite a while before this new breed of data scientist will be able to add significant business value. In fact, very few individuals today possess deep expertise in all or most of the required skills. One clear reason for that is that business and strategy experience doesn’t come quickly – it usually takes quite a few years working in the organisation to pick up that level of business experience. So many organisations are investing in a data science team. By building a strong team at the outset of a data science program, the organisation can lay the foundation to evolve its capabilities and eventually develop a business-focussed cross-functional analytic team. In other words, even in the long run the most effective data science capability may in fact still be a team. Of course, the entire capability can be outsourced too.

The organisation also has to figure out who will take ownership of and be responsible for the data science function. As it will often operate across organisational boundaries, its exact position and reporting lines are not all that important, but you need to make sure it’s not placed in an environment that stifles its creativity and constrict its working across functional domains. I have no doubt that a skilled data science team will be able to improve performance at every level of the organization and add value across all functional streams.

Organisational readiness

Organisational readiness relates to quite a few aspects, but the two that stand out for me are informational maturity and budget.

Some of the indicators which illustrate whether an organisation’s information maturity is a sufficiently high level are:

  • Data is viewed as an organisational asset that is managed with the same discipline and rigour as finances, human resources and products.
  • Up to the executive level, key decision-makers require insights beyond reporting on what has already happened; they want to better understand why it happened and what is likely to happen next.
  • The executive level takes ownership of the organisation’s data resource and its analytics capability. Either the CIO focusses on information, insights and innovation; or if it’s a CTO-type CIO, then a CDO that takes care of data and information is appointed at the board level.
  • Some of the business drives that have been identified and prioritised focus on information- and insight-driven outcomes, for example as mentioned for the business cases above.
  • The organisation’s BI is “in order” and it timeously delivers good quality information to decision-makers on all levels across most business functions.

As the business evolves more towards evidence-based decision making, data science may just be the catalyst the organisation needs to take it to the next level.

Turning to budget, data science doesn’t come cheap. Apart from the infrastructure and toolsets required – more on that below – as mentioned above, you need to budget for the equivalent of one senior and two junior FTEs to run such a program. Of course, the business case (mentioned above) must enable you to justify running such a team.


Two aspects of the data science team’s “working environment” are important, namely infrastructure and data.

It depends a bit on the state and capability of the systems, tools and infrastructure already used for BI – because sometimes that environment can be used for data science as well. However in most organisations I’ve come across, the data science team are utilising a separate “analytical sandbox” environment, where they can explore, process and build analytical models to their hearts’ content, without having any impact on any other users and processes. The hard core data scientists also require additional toolsets, such as SAS Enterprise Miner, SPSS, R, Python, some statistical packages, etc., that are hardly required by other users, bar maybe the actuaries.

The data science team typically also require a wider, deeper and richer dataset than what is typically available in the enterprise data warehouse. A good place to start the analytical sandbox environment is by creating a copy of the enterprise data warehouse, but it typically has to be enriched and extended with additional datasets. Much of the value comes from comparing, cross-referencing, and exploring diverse data sets. For analytical modelling many variables also have to be transformed into categorical or interval sets.

Because data scientists need to explore and experiment, an environment that affords time for creative discovery is critical to long-term success.

Concluding remarks

Thinking about, debating and getting conclusion on these aspects will assist you to get clarity about what data science can do for your organisation, and will get you thinking about how too.  It will also give you a good indication what to budget for, in terms of resources, licences, running costs and infrastructure.

If you cannot find the right multi-skilled resource, and you can afford it, the concept of a data science team servicing the organisation is a definite way to look at getting it off the ground and running sustainably.

Leave a Reply