Big Data Privacy


Big Data PrivacyIn my previous blog I discussed Big Data Security, which is about the protection of data in storage and in transit. In this post I focus on privacy, which refers to protecting the rights of the individuals or organisations represented by the data. Privacy breaches of any sensitive data can cost an organisation dearly, both in terms of reputation management or as a result of legal processes.

The threat to privacy is very real – to all of us. Paul Ohm, associate professor at the University of Colorado law school, wrote the following regarding Big Data in Harvard Review: “These databases will grow to connect every individual to at least one closely guarded secret. This might be a secret about a medical condition, family history, or personal preference … It is a secret that, if revealed, would cause more than embarrassment or shame; it would lead to serious, concrete, devastating harm.”

Likewise, Jon Leibowitz, the chairman of the US Federal Trade Commission (FTC), recently coined the word “cyberazzi” to describe data companies that trawl the internet for information on consumers. The cyberazzi stake out your web browsers and mobile phones to quietly harvest data on what you like, where you go and what kind of questions you ask.

Your personal digital footprint, an ineradicable record of every electronic interaction, just keeps increasing. Your email traffic, internet search history, geotagged images on your Smartphone and social media sites, retail purchases, loyalty program transactions, invoice payments, toll road payments and medical records all add to the unique tread that makes up the footprint.


Anonymisation should protect the identity of individuals, as it decouples all the identifying attributes such as names and phone numbers from useful locality or transactional data that can be used for research or marketing. It makes it possible for organisations to collect and analyse large volumes of data, without any privacy risk.

However, frightening as it may seem to be, according to a recent publication in Scientific Reports, people’s day-to-day movements are often so predictable that even anonymised location data can be linked back to identified individuals with relative ease when it is correlated with other outside information. Apparently our movement patterns are so repetitive and predictable that as few as 4 data points that include date and time are enough to identify an individual.

The CIA’s CTO also stated that mobility and privacy do not go hand-in-hand. We are constantly tracked through our mobile and other paired devices, which makes it ridiculously easy to identify us. Using Big Data capabilities makes it even easier for an organisation to take a large mass of supposedly anonymised localisation data and re-calculate the details that are supposed to be hidden.


The reidentification of individuals may be useful in limited cases. For example, anti-terrorism and law enforcement agencies can pinpoint individuals who are a likely threat to society or involved in criminal activities. In general, this would be a positive use of Big Data.

But misidentification can also take place – with potentially serious consequences for the poor individuals. The accuracy of the source data and the validity of the analytical techniques need to be meticulously verified to minimise the possibility of misidentification.

Organisational responsibility

Although Big Data technology makes it possible to follow our every move anytime and anywhere, it does not mean that organisations can do whatever they want with that data.

In his blog called Big Data Startup, Mark van Rijmenam suggests four ethical principles that organisations should adopt in order to protect consumers’ privacy:

  1. Radical transparency – inform customers about the data collect about them and what exactly it is used for;
  2. Simplicity by design – keep it simple and understandable for consumers;
  3. Secure the data – only keep the data really required to do business with, and keep it as secure as possible; and
  4. Privacy by design – make privacy part of the DNA of the organisation.

Even though it does not list any specific actions, the last point will ensure that the privacy of consumers will be taken seriously, and will therefore be sufficiently protected.

Many applications, whether they’re on-premise or in the cloud, are not that transparent about what they do with data they collect either. That causes a lot of privacy fear.

Organisations have to realise that they cannot survive without consumers, but nowadays consumers can survive without those particular organisations. Organisations have to ensure consumer privacy or they will simply take their business elsewhere. Using social media, those individuals can even influence large groups of consumers to boycott the organisation totally.

Data governance

Kord Davis, former analyst at Cap Gemini, advocates for more discussion on the rules that govern Big Data usage. The discussion needs to go beyond privacy, into identity, ownership and reputation. “We’re going to have to learn how to have those conversations in environments where we haven’t typically had to have them.” However, he realises it is not easy. “Being able to come up with a broad-based, global set of guidelines for handling Big Data in an ethical fashion is going to be difficult.” What is very valid though is that he doesn’t see a need for the conversation to be exclusively on Big Data – it applies to all data.

Organisations also have to be careful where their data is stored and processed. You need to consider what the privacy and security governances around cloud-based storage and cloud-based analytics are, especially when different laws may be governing the data than those of the country where it was captured.

Customer interactions

It is one thing to use Big Data to determine market trends, market sentiment, and make strategic decisions using that information. It is a totally different story when individual customers are contacted in some form or another, based on the outcomes of Big Data analytics. Such interactions are easily considered privacy-invasive.

Furthermore, are individual business decisions based on the outcomes gleaned from analysing Big Data valid? Examples include hiring an employee or approving an insurance policy.

Concluding remarks

People have been speaking and writing about data privacy for years, and still there is no easy solution to this problem. In fact, with Big Data and social media, the problem is a lot more complex. Location-based services don’t work without location. Social media sites do not work without linked commentary. We are not going to stop all the data collection, so we need workable guidelines for protecting privacy.

However, new technologies are always developed and adopted by trial and error and unfortunately we will probably see a few privacy-related Big Data catastrophes in the near future. Organisations that want to survive in this world will have to follow ethical guidelines for handling privacy. Those developing data-centric products also have to start thinking responsibly.

As Rebecca Herold put it so well: “Big Data and associated analytics can be used to improve business and customer experiences and bring about innovation and medical breakthroughs. However, organizations must make sure they don’t cross over that line of customization and business improvement into creepiness or, worse, privacy invasion.” In some cases, it is a very fine line indeed.

Big Data privacy is a vast topic – way too much for a single blog post, really. As the use (and abuse) of Big Data develops, so will the privacy issues and the related protection guidelines, governances and regulations change and be adapted. Organisations collecting and analysing Big Data has to make sure they play by these rules, and keep up to date as the playing field changes. Not doing so may be devastating…

For those really concerned about privacy, this Privacy Survival Guide may be useful.

1 comment

1 ping

  1. Martin

    Here is an interesting blog post about the Australian government introducing laws that individuals need to be notified if their privacy has been breached:

    However, privacy laws should be even more powerful to ensure such breaches do not occur at all!

  1. Big Data and MDM » Martin's Insights

    […] « Big Data Privacy […]

Leave a Reply

hope howell has twice the fun. Learn More Here anybunny videos