Data Journalism


Data journalismAt the ITWeb BI Summit recently held in South Africa, where I presented a paper on Business Analytics, an audience member asked whether Data Science was related to Data Journalism in any way. This blog explores Data Journalism and its relationship to Data Science and Data Visualisation.

Wikipedia describes Data-Driven Journalism as a journalistic process based on analysing and filtering large data sets for the purpose of creating a new story. Coincidently I covered storytelling in a previous blog post on Data Science, where I elaborated that storytelling, coupled with visualisation, is the best way to make the outputs of advanced analytics and other data science initiatives understandable, consumable and actionable to the decision-makers in the business (www.martinsights.com/?p=206). When we add journalistic writing to the mix, it becomes more interesting. We all know what a journalist can do with some statistics if he/she needs to publish a sensational piece! But we need to distinguish Data Journalism from mainstream sensationalistic journalism. Accurate data journalism should result in stories that are based on the underlying data, and which are verifiable, trustworthy and relevant. The Wikipedia entry adds that data journalism deals with open data that are freely available online and that are analysed with open source tools. However, I don’t think the qualification about open source tools need necessarily apply.


Information architect and multimedia journalist Mirko Lorenz describes data-driven journalism as a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing and making a story.  That sounds a lot like Data Science to me. Other authors like Paul Bradshaw have also described the process in more detail. Although the sequence may differ, most authors have variations of the following steps in their workflows:

  • Finding: This includes getting access to data supplied to the organisation, data found through a search process, data published publicly, data obtained by scraping websites or applications, or data collected through observation, surveys or crowd sourcing. The finding of data is one of the two key steps that distinguish data journalism from other data science activities. Obviously privacy and data ownership needs to be considered too.
  • Cleansing: For the stories to be trustworthy, you have to be able to trust the quality of the underlying data.  That implies that it has to be cleansed to remove error, reduce duplication, replace missing information, correct formatting, standardise names and terms, and also translate codes, jargon, and multi-lingual data.
  • Transforming: Convert the data to a useful format, sometimes to a form consistent with other available data.
  • Contextualising: Especially with public data, the context needs to be established to gauge its truthfulness and relevance. It must be determined who gathered it, for what purpose, when, how and what is meant by its contents. The relation of the data pertaining to the population under study also needs to be determined.
  • Combining: As with conventional data, public data will add more value if it is integrated with existing meaningful data. This also adds more and better context. In some cases multiple sources also serve as a validation of correctness and relevance. Combining different datasets around common pivotal entities (such as customers or products) may also reveal previously undiscovered relationships.
  • Visualising: To display the data visually through a variety of graph types, and to analyse, explore and discover the trends and other stories hidden within.
  • Compiling: This defines the act of data journalism – where a question is answered through the data analysis, or where a dataset is analysed to question what it can reveal.
  • Communicating: This is the other key step in data journalism – to get the story conveyed in the appropriate format. This is best done by a combination of data visualisation and journalistic writing, where the latter is used to add the interpretation to the former. Other formats of communication include infographics, maps and charts, through to presentations and video podcasts.
  • Distributing/Publishing: Proving access to the story through facilities like email, social media and web access, across a wide variety of platforms, including tablets and mobile phones. Data journalism publishing sites are becoming market places, not only of the stories and insights, but also through that, of the underlying data.
  • Tracking: Measuring the reach and penetration of the story into its target audience.


So what we are seeing is that the application of data journalism is leading to a new business model, which we can call Insight-as-a-Service, to align it with all the other -aaS-isms out there. Organisations are taking public data, analysing it, and publishing insights in meaningful ways. How meaningful will determine the value of the service.

Data journalism is a natural growth area for organisations in the media business, who have to access and analyse public data for their journalistic work anyway. New news is now getting broadcasted so fast via informal channels like Twitter that organisations in the information distribution business have to look at new avenues of revenue. So, the focus totally changes from “drawing attention” to “creating trust”. Insights-as-a-Service are only going to be paid for if the underlying data, the analytics, derived insights and publication thereof is relevant, current and most importantly – can be trusted. While it has become easier to publish stories faster and wider, it takes more in-depth skills and effort to analyse public data and derive useful trustable narrative from it.

Leave a Reply

hope howell has twice the fun. Learn More Here anybunny videos