In preparation for Enterprise IQ’s upcoming 8th Annual MDM & Data Governance Summit, I have been researching the touch points between Big Data and Master Data Management (MDM). To set the context, at the recent Gartner MDM Summit, research vice presidents Andrew White and John Radcliffe stated that the increased popularity of social media, cloud computing and Big Data will have a significant impact on organisations’ MDM programs.
MDM
Gartner defines master data management (MDM) as a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of shared master data sets. MDM, by definition, focuses on the highest value entities within an organization.
On a technical level, there are many configurations and architectures through which MDM solutions are implemented, including application hubs and even as components of EDW architectures. I have always been a supporter of the application hub concept, where all master data is moved to the master data application hub, over time, and all master data management is performed through the hub, from where it is provided to tightly integrated applications.
Organisational function and placement are very critical for master data ownership, stewardship, management and governance too. For example, one of our clients manage their MDM implementation through the BI competency, even though it’s on totally different technology, because they believe the BI-CC has the best understanding of the master data and the rules and approaches to integrate it. Rightfully so in their case, as the BI-CC has been through that process a few times to achieve master data conformance in the EDW. Other organisations manage it through IT, the CIO’s office and some even drive it from a business function that claims ownership of that data.
The Big Data Challenge
With Big Data becoming popular, the MDM program now has to cover both internal data that the organisation has been managing over years (like customer, product and supplier data) as well as Big Data that is flowing into the organisation from external sources (like social media, third party data, internet data) and from internal data sources (such as unstructured content in documents and email).
The new Big Data sources may contain new insights but they are often hard to identify and place quickly and cost-efficiently. For example, many organisations are now interested in customer sentiment. Sentiment can be analysed on a global level – in terms of what are “people” out there saying about us? But when the analysis moves from the aggregate level (general market sentiment) to specific analyses (which customers have commented on product X), that is when master data is required to guide Big Data analytics. For example, if you detect a customer complaining about a product or service on Twitter or Facebook, you want to pinpoint that to the particular product or service, and the particular customer. However, the identification of the product, service and customer is very different whether you work internal and external to the organisation. One of the biggest MDM challenges is to establish that coherence between data collected from social media and the existing product and customer data repositories.
An organisation has no control over the external creation and dissemination of social network data, but it can use MDM tools and techniques to govern the linkage of internal customer master data to external social network profiles in an effort to learn more about those customers. Similar approaches can be used for providers and products too.
MDM and Big Data integration
There are well-defined integration points between MDM and Big Data, which include importing and analysing unstructured data to create or identify master data entities, loading additional profile information into the MDM system, sharing master data records with the Big Data platform for analyses, and reusing the MDM matching capabilities in the Big Data platform, for example, to match customers.
In his blog post “Aligning Big Data with MDM”, David Loshin discussed identity resolution as the biggest integration point between MDM and Big Data. A master entity index is used that contains the right set of identifying attributes to match a pair of records representing the same real-world entity (i.e. the same customer, even identified differently in different systems), while differentiating between pairs of records that do not represent the same entity.
David also listed alternative ways in which the MDM environment can support Big Data analytics:
- Optimisation of data access: If the master index contains a link that maps a recognised identity to the original sources, it can be used to optimise federated data queries. Once the specific entity has been identified, map information in the MDM environment can be used to formulate the target queries to access the data from the original sources.
- Real-time entity identification: The resolution methods that use the master indexes can be embedded in stream processing applications to extract entity information and match it to existing entities as the data is processed “on the fly”.
- Cross-domain relationships: As the number of entities grows, the number of relationships may grow in proportion to the square of the number of entities. The MDM solution may need a more scalable framework for representing and managing the profiles and relationships.
- Improved data quality: Integrating data quality assurance within the MDM environment can improve data trustworthiness. The relative quality rating of each data source can be provided with the data.
- Improved privacy: Dynamically masking sensitive data prior to delivery allows data to be used without violating data privacy policies (see my post on Big Data Privacy).
So MDM can be used to improve the accuracy and performance of Big Data analytics, but are there Big Data capabilities that can enhance MDM functionality and performance? This is an application of Big Data technology that I find really interesting, especially utilising the more specialised NoSQL data stores. A columnar data store can significantly speed up entity searching and matching, especially in-memory (but the latter is another discussion). A graph-oriented data store is really useful to store complex relationships and other social network data. A key-value store is useful to store the wide variety of semi-structured attributes you may amass about interests, sites visited, preferences and so forth. Bring on polyglot persistence to MDM!
In his blog David Loshin also elaborates how identity matching can be scaled across a Hadoop cluster. While the reasoning makes sense – and it may be useful for an Amazon or an eBay, I still have to be convinced that it will be required on such a large scale by the average retailer, insurer or even telecommunications organisation. But it may just be that such a highly parallelised search and identification process justifies the means in the end.
So there is a symbiotic relationship between MDM and big data. The Big Data platform feeds additional insights to MDM, while the MDM platform augments the Big Data with master data definitions of customer, household, relationship, product, and their related hierarchies. This is very similar to the relationship between MDM and the data warehouse.
Business value
The drivers for MDM are more accurate reporting and analytics, increased operational efficiency and improved customer service. The integration of Big Data and MDM can surface new value propositions, beyond what was previously possible:
- Augment traditional product information data with dynamically derived product traits based on web and social media feedback.
- Improve the “360-degree view” for customer service and marketing by using the structured data in MDM as a starting point and then analysing the Big Data sources to add relationships, hierarchies, intent, sentiment, etc.
- Discover additional relationship links between master entities based on insights from unstructured documents and social media interactions. What better place is there to gather friends and influence circles than on Facebook and business associates than on LinkedIn?
Governance
Many Big Data solutions have been set up independently by business users and analytics teams without IT being involved or aware. One of the consequences of this scattered approach is that large demands will be placed on MDM and data governance processes to ensure that the Big Data environments are able to live up to overall expectations for providing business benefits and competitive advantages.
Nick Millman, who heads Accenture’s process and information management practice in the UK and Ireland stated: “Big data by its very nature is enterprise data, not limited to one silo or organisation. It necessitates enterprise MDM rather than MDM for a specific function.” He concluded that organisations that have effective data management processes are going to find it easier to extract the most value from Big Data. “Those that don’t are going to have the light shone on data governance and data quality issues and will realise that they need to fix some of the fundamentals.”
According to Steve Jones of Capgemini, you’ll need to think less about MDM as a repository and more about MDM as a way to govern global information. John Radcliffe, a Gartner research vice president, said MDM programs will ultimately need to “govern the relationships” between internal data and Big Data from external sources.
Forrester analyst Michele Goetz stated that MDM requires a “reboot” for Big Data. She focused more on internal unstructured data, but still accentuates the importance of data governance. “Because data has moved beyond structured and relational database constraints with Big Data, MDM must account for the structure and enforce business policies for a trusted holistic view.”
Obviously, MDM done well has always had governance as a key component, because a good MDM program will cover who owns the data and who has the authority to alter it. But Big Data puts even more demands on MDM as a governance tool, because there is now so much more data, and the focus has shifted from just adding it to filtering out what’s usable and useful to the business.
Paradigm shift
Because of the high interest that business managers are showing in Big Data, IT departments should be aware of the implications that Big Data has for their MDM strategy, particularly as organisations adopt more coordinated approaches to deploying and using Big Data analytics into the main folds of business processing. This has an impact on the way we traditionally thought about managing data. As Loraine Lawson, in her blog titled “Master Data Management Grows Up, the Finale: Big Data” says it very well: “Because it turns out, applying master data management to Big Data may be less about MDM and more about a paradigm shift in how we think about and use MDM.”
Although there are different ways to approach MDM, it’s often seen as a repository for master data. All the data is dumped into MDM for sorting, cleansing and achieving that mythical “one version of the truth.” But adding Big Data will change that, says Steve Jones, the global lead for MDM at Capgemini, because Big Data is too big and changes too fast for that approach to work. “Really, it’s about shifting the information when it’s required and providing that identification, less than the historical view of effectively a digital landfill of data into which everybody poured everything.”
Andy Hayler, CEO at The Information Difference, said one positive aspect is that social media analytics offers a green field for MDM, unencumbered by earlier implementations. “There will be a need to define some structure around how the company views social media, to do with customers,” he said. “Because there is no legacy, there is a chance to get this one right before the users build their five or six different computing systems; this is the one thing we can start from scratch on.” Of course Andy’s remarks have to take into consideration the few renegade Big Data implementations that are being done as skunkwork projects right as we speak.
Concluding remarks
Clearly there is a bi-directional relationship between MDM and big data. Big data technology can benefit from master data as a starting point for searching, identification and analysis, and it can also help augment or feed new insights and facts into an MDM system. To use data warehousing terminology, MDM can provide the dimensions for analysing Big Data facts. Big data also has to be incorporated in existing initiatives such as data governance and data quality.