By Michael Meyer
Published on 2022年12月21日
Master Data Management (MDM) and data catalog growth are accelerating because organizations must integrate more systems, comply with privacy regulations, and address data quality concerns. The significant impacts are organizational efficiency, decision-making, and overall customer satisfaction.
MDM is a discipline that helps organize critical information to avoid duplication, inconsistency, and other data quality issues. The concept is to have a “golden record” for core entities in your organization including customers, products, locations, and many others. Transactional systems and data warehouses can then use the golden records as the entity’s most current, trusted representation.
A data catalog is a metadata repository of information sources across the enterprise, including data sets, business intelligence reports, visualizations, and conversations. Early on, analysts used data catalogs to find and understand data more quickly. Increasingly, data catalogs now address a broad range of data intelligence solutions, including self-service analytics, data governance, privacy, and cloud transformation.
MDM and data catalogs share a common purpose: to help businesses improve data management to make better business decisions. Data catalogs provide employees with an understanding of where their data resides, communicating policy information, and how to use data appropriately. MDM delivers quality data that enables people in organizations to make decisions and serve customers more effectively.
Implementing a data catalog first will make MDM more successful. The reason why can be broken down into these critical areas, including identifying resources, assessing data quality, and defining policies.
A data catalog provides a central place for stakeholders to participate in a major undertaking such as MDM. Successful implementations must start with a complete understanding of the resources involved, such as the people, processes, and data.
Identifying and cataloging data sources that create/update data for the entity you are trying to master is essential. Stakeholders need to understand the data before mapping the attributes to be mastered. The data catalog can capture valuable information, including:
Tables and column descriptions
Key business people (domain owners) and application subject matter experts
Source to MDM target mapping matrix of the attributes to be mastered
Downstream consumers
Attribute classification as Master, Transactional, or Reference data
Suppose a company decides to go with the MDM implementation first. In that case, they will spend more time analyzing, reworking the solution as new sources are found, and communicating the impacts, especially to downstream consumers.
It is critical to ensure the quality of the data before trying to master it. Having good data is crucial to creating golden records.
A data catalog communicates the organization’s data quality policies so people at all levels understand what is required for any data element to be mastered. Documenting rule definitions and corrective actions guide domain owners and stewards in addressing quality issues. Using the catalog to review data profiles can help discover other potential quality concerns.
In an MDM-first approach, it can be challenging to convey the importance of data quality to stakeholders, especially if this is the first time they are introduced to it. Without appropriate data quality, the testing phase of the mastered entity may reveal issues causing stakeholders to question whether MDM is working. The time it takes to understand and resolve the data quality cases could derail the project for an extended period.
There are rules that must be defined to construct the golden record. It is critical to outline how the process will work for merging and matching the data for a mastered entity. The rules contain pertinent information for constructing the golden records, such as what makes the entity unique, which attribute value to use when there are duplicate records, reference data alignment, what amount of latency is acceptable, and others.
You can create the MDM policy definition for the domain entity you are mastering in a data catalog. Having this information in the catalog makes it easy to find and use. The policy provides communication to stakeholders so everyone understands how the mastered entity is constructed.
When a team starts with MDM first, often the rule definitions become an exercise led by IT, and the representations get buried inside the MDM software. The transparency to stakeholders suffers, and alignment is easily overlooked, causing longer testing cycles and rework.
When starting an MDM project, a data model must be created as the blueprint of what the mastered entity comprises. Most MDM tools provide the means to develop the model containing the tables, relationships, and attributes pertinent to the solution.
The model must reflect the analysis work done to this point in the data catalog. The information such as the sources, source to MDM target mapping matrix, and attribute classification are essential to creating the model. Curated descriptions from the data catalog should be used in the model for continued consistency and understanding of the mastered entity.
In an MDM-first approach, modeling the mastered entity first without understanding the outputs from the analysis put into the data catalog is a recipe for rework and frustration.
The next step is building the objects, including tables and columns. The MDM tool should help with this.
In the MDM tool, the team should document references to the MDM policy and metadata information from the data catalog. This is critical information to have before making any future changes to the MDM solution.
A company that decides to use and tweak a base model from the MDM tool, will likely fail to meet its needs. Modeling is a critical process step that relies on analyzing and understanding the data. The best way to gain this knowledge is by having it easily accessible in a data catalog.
The final step is to build the solution, which includes creating the code and pipelines to address data duplication, inconsistency, and quality issues so the golden records can be made.
By establishing the MDM policy in the data catalog, engineers can use the rules as the requirements for building the solution. They can also collaborate inside the catalog to get answers to any questions that arise during construction.
When using an MDM-first approach, engineers often start development with the rule definitions created while coding the solution. This might get the code to testing faster but inevitably will not match up with stakeholders’ expectations. The project will then have to return to defining the rules and someone will have to explain to management why delivery dates have to be moved.
It is clear that having the data catalog in place first provides a better path to success for an MDM implementation.
The benefits that organizations can achieve by starting with a catalog include:
Reducing analysis and planning time to do an MDM implementation
Helping to ensure the quality of the data before trying to master it
Increasing communication to minimize downstream impacts and outages after the MDM implementation
Schedule a personalized demo today to learn more about how Alation can help you accelerate your successful MDM implementation.
MDM is a discipline that helps organize critical information to avoid duplication, inconsistency, and other data quality issues. The concept is to have a “golden record” for core entities in your organization including customers, products, locations, and many others. Transactional systems and data warehouses can then use the golden records as the entity's most current, trusted representation.
Implementing a data catalog first will make MDM more successful. The reason why can be broken down into these critical areas, including identifying resources, assessing data quality, and defining policies.