Big Data Security


Big Data SecurityBecause Big Data is really about volumes of new information moving into, within and through the organisation, two key factors that must be considered are security and privacy. Security is about the protection of the data in storage and in transit, whereas privacy refers to protecting the rights of the “subjects” represented by the data. If the security risk of Big Data goes unrecognised or unacknowledged, it can result in situations whereby data privacy can be breached, which, at the end of the day, can leave the organisation very vulnerable or even cause severe damage. In this post we focus on security.

A number of questions arise over the governance and security of these new data sets, such as: who owns, or has rights to, these new data sets? Who decides what access controls should be applied to the new data sets, and in which legal jurisdiction? And how does that apply to aggregations applied to that data set? Or when it is integrated with other data sets? When data from one country is hosted, through the cloud, by a provider in another country, the legislation around data ownership and security becomes a lot more complex.

While most organisations are focused primarily on applying advanced analytics as the key part of their Big Data strategy – as that is where the perceived value lies – having a Big Data security and privacy strategy is equally important. If such a protection strategy is not put into place, Big Data can easily fail to offer its intended benefits. In fact, security breaches of sensitive Big Data can cost the organisations dearly, both in reputation management and/or as a result of legal processes. If Big Data is considered as a valuable business asset, it needs the right security protection in order to provide benefits to the organisation.

On the other hand, treats abound. Globally, cybercrime is a multi-billion dollar business with some of the smartest brains employed to crack security systems. Big data, due to its state of flux and due to the complexity to manage it properly, presents a rich target of opportunity for cybercriminals, especially Big Data hosted in the cloud.

Managing security in the Big Data world is complex – very complex. There are already more than one hundred variations of Big Data management systems focusing on different types of Big Data (key-value pairs, geospatial data, documents and other unstructured data).

There are many different query models, data storage models, styles of parallelisation and resource management. Many of the Big Data solutions have discarded some of the core features of relational databases in order to improve performance and scalability. In addition, many of these environments store multiple copies of the data across multiple nodes in order to provide fail-safe operations should any single node fails. These aspects make securing these systems all the more difficult.

The Big Data databases do not use a centralised DBMS, which masks everything that happens inside the database from other applications. As Adrian Lane puts it so well in a white paper on Big Data Security: “Big data exposes its skeleton to the applications that use it.” The data stored in a Big Data clusters is basically stored in files. Each application can maintain its own schema for its data, but that data is stored across hundreds or thousands of nodes. The loose confederation of nodes creates many performance advantages, but it also poses unique security challenges. All the data in the cluster is subject to the same threats to normal files. Validating which client applications should have access which data nodes is difficult. The elastic nature of Big Data means new nodes are automatically meshed into the cluster, where data and processing loads are liberally sharing to handle the application requests. Most of the research resources have gone into processing larger volumes of more complex data faster, with very little spent on adding security features to Big Data platforms. But in fact, security should be just as scalable, high-performance and self-organising as the data storage clusters are.

In his white paper, Adrian Lane provides seven tips to improve Big Data security:

  1. Engage your architects: The architects of the enterprise IT systems know how the organisation’s data is used and how its applications work. These architects can assist with a Big Data security program because they know what you have and understand what technologies can be deployed in the cluster.
  2. Discover what data is at risk: Write some queries to search for sensitive data in the cluster. This will give you a much better idea of what you have and if you have a security, compliance or legal risk. Knowing what data you have stored is a good first step in understanding what security protections need to be put in place.
  3. Protect data: Big data clusters store data in any number of systems simultaneously. To protect the data from unwanted inspection, it has to be implemented using a transparent file- or OS-layer encryption that scales with the cluster, as nodes are added or removed. Data written to disk is automatically encrypted so it can be read only by the application with the decryption keys. Of course, the security of encryption is only as good as the key management system.
  4. Authenticate nodes: Big data clusters were designed to implicitly trust new nodes as they get added. But this means it’s very easy to add a rogue node to grab a subset of the data or even query other nodes for data. In virtual and cloud environments, it’s easy to snapshot a live node and deploy a copy onto the cluster. The cluster must be configured to validate new nodes.
  5. Protect communications: An attacker can gain access to a network running a Big Data cluster, by monitoring the network traffic to determine the queries, or the resultant data. A rogue IT administrator can monitor network traffic and sniff sensitive information. TLS and SSL are standard encryption communication protocols used to ensure that network communication between nodes remains private.
  6. Protect the “management plane”: Many Big Data installations run on public cloud resources — either as infrastructure-as-a-service or as platform-as-a-service. The management plane is fundamental to cloud providers for all resource management tasks, including provisioning of users, allocation of process and disk resources, network security and archival facilities. Because the management plane is so powerful in cloud environments, great care must be taken to ensure that the credentials/certificates used to establish administration identity are kept secure, and that each administrative role is reduced to a subset of overall capabilities.
  7. Engage operations staff: The operations staff who manage the cluster on a day-to-day basis can have a great impact on security. Many environments offer simple tools for patching, configuring and validating clusters before they launch. Gaps in security patches and bad system configuration provide easy pathways for the bad guys into the company. Automated systems management makes it easier and more likely that an organisation will have consistent baseline security across the entire Big Data cluster.

Concluding remarks

My personal take-away from this is that Big Data Security is still in its infancy. And granted – that is to be expected in such a fast-developing technological field, especially where the emphasis has been primarily on volume, performance and insight. However, that does not mean that security can be neglected. In fact, because its implementation is more primitive and manual, greater care and greater effort has to be spent to ensure that a Big Data security and privacy strategy is not only in place, but that it is implemented to a level where it can be trusted to protect the Big Data and its contents sufficiently. The tips listed above will definitely assist in the process.

Big Data can add some form of benefit to most businesses. However, like all data, its processing, storage and utilisation must be well controlled. At the end of the day it must not pose a security threat to the business, nor a privacy threat to the people or organisations whose details are contained in the Big Data. Big Data governance, security and privacy protection should go hand-in-hand when investing in the concept, as this will go a long way into protecting the benefits that it can offer.

1 ping

  1. Big Data Privacy » Martin's Insights

    [...] « Big Data Security [...]

Leave a Reply