Difference between Reference data and Master data

Oli Steadman
2 min readDec 17, 2020

This webinar does a strong job of translating between the two:

It includes a handy mapping of Maslow’s Hierarchy Of Needs to a “Data Management Practices Hierarchy” in which Foundational Data Management Practices:

  • Data Governance
  • Data Quality
  • Data Management Strategy
  • Data Platform/Architecture
  • Data Operations

… are fundamental pre-requisites to enabling Advanced Data Practices:

  • MDM
  • Mining
  • Big Data
  • Analytics
  • Warehousing
  • Service-Oriented Architecture

You have to address the pre-requisites before trying to deliver the advanced capabilities; you have to walk before you run.

One key metric I use to determine whether an organisation has learned to walk yet, is whether they’ve settled on their domain-particular definition/understanding of what constitutes Reference Data vs Master Data.

Master Data Management is a discipline or strategy analogous to DevOps in software development; both MDM and DevOps often get misunderstood as more tangible “toolings” or “skillsets”. Both disciplines depend for their efficacy upon a clear understanding of concepts and, in the case of MDM, those include:

  • Reference Data: control over defined domain values for standardised terms, code values, and other unique identifiers; e.g. maintaining 9 possible gender codes. Technical-focussed.
  • Master Data: control over master data values to enable consistent, shared, contextual use across systems; the “golden” source of your customer “Pat”. Business-focussed. Another example is Google’s storing a mater list of “buildings in the world”… this is not reference data.

Those 2x types of data can drive 2x subtly different infrastructure setups:

  • Reference Data Architecture: a widely accessible Reference Data Management System controlsthe RD being fed to OLTP.
  • Master Data Architecture: less control over MD; it comes from a System Of Record managed outside the main collaborative arch.
  • A Combined R/M Data Architecture is also available.

Will this all change in line with deprecations of the word “master” by major tech players from GitHub to LinkedIn?

Other handy definitions from the webinar linked above… some of these introduce a bold, almost orthogonal aspect, to the traditional definitions found for the relevant term:

  • Data Steward: “someone who’s job it is to make data available as an organistional asset”
  • Business (Process) Architecture: an insightful mapping of how MDM supports specific parts of the business and its processes
  • Data Governance Personalities: usually described in terms of reoles & responsibilities but seldom given that “person” quality, these include steering committees, business data stewards, subject matter experts, data consumers, standards organizations, data providers, application users, BI & reporting users, application developers & architects, data integration developers & architects, BI vendors & architects, vendors, customers, partners, data governance council, and other IT professionals
  • Library Science Professionals aka Data Curators: a class of professionals who were already doing a lot of this thinking & problem-solving long before they became hyped and manifested into the creation of the CDO role

--

--