Introduction.  In this third blog installment (#3.1 and #3.2) at Penn State University’s College of Information Sciences and Technology’s graduate course, EA874 – Enterprise Information Technology Architecture, we focus on data architecture. We focus on defining and contrasting data management concepts:  master data management (MDM), data warehouses, data lakes, and, briefly, data hubs.

Master Data Management (MDM) Definition.  To open this discussion, we examine and define a critical component of modern data management – MDM.  To quote Emad Yowakim’s LinkedIn OP-ED (cited below):

“Master Data Management (MDM) refers to the process of creating and managing data that an organization must have as a single master copy, called the master data…[it] is important because it offers the enterprise a single version of the truth.”

Benefits of MDM.  By focusing on and identifying a single version of data entities, MDM attempts to eliminate redundant data. This process creates efficiency and removes complications resulting from multiple versions of a single data entity.  Redundant data entries can lead to inefficiencies because each COULD contain conflicting information, or have been updated by multiple sources at multiple times. Data-driven enterprises who don’t implement some form of MDM leave the potential for inconsistency and confusion. The use of consistent systems and processes throughout an enterprise is key to MDM, achievable through a solid and well-planned IT governance program.

MDM Applications – Opinion.  In Emad’s OP-ED, “MDM is typically more important in larger organizations. In fact, the bigger the organization, the more important the discipline of MDM is, because a bigger organization means that there are more disparate systems within the company, and the difficulty on providing a single source of truth, as well as the benefit of having master data, grows with each additional data source.” However, I would provide a counter-argument; while I acknowledge that larger organizations struggle with keeping single sources of data due to their higher data volume and larger number of data sources, I would offer that single source data can also be critical to smaller organizations. Redundant, incorrect data in smaller enterprises can be equally damaging to internal processes such as manufacturing and production, or problematic for external processes such as customer service or marketing. Can you imagine, as a customer, receiving conflicting answers (sets of data) from multiple sources in the same company, and how confusing and frustrating this can be? Or identical manufacturing pieces receiving two different sets of manufacturing data from their controlling machines? The principles of MDM are critical at every level of those organizations which rely on precision to operate effectively.

Challenges in MDM.  Mr. Yowakim’s LinkedIn article addresses one of the primary challenges related to MDM in the business environment – mergers and acquisitions. He addresses a common challenge related to the requirement for enterprise architecture planning – “how to merge the two sets of data will be challenging.” In addition to leveraging MDM concepts, a solid enterprise architecture governance program, and general EA tenants such as consistency, enterprise-wide planning, and central repositories, Emad also suggests appointing a dedicated steward for MDM, which “can also be a group… such as a data governance committee or a data governance council.”

Personal Observations.  In my experience as a user and sometimes contributor to large information systems in the U.S. government, a common complain encountered is the lack of a “system of record” for all functions of the organization.  In recent years, my organization has taken a number of steps to address this and has instituted enterprise-wide services such as a single login throughout the organization.  However, information management and establishing a “single truth” similar to MDM which eliminates redundancy among entities remains a challenge.  Recently, our group discussed some advanced alternatives which would link and automatically consolidate redundant entities based on metadata, but these advanced tools have yet to be applied throughout the enterprise uniformly.

(2) Gartner, “Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together”, Refreshed 2 June 2021, Published 13 February 2020.


By admin