In today's data-driven world, ensuring the quality of your data is paramount. Data quality can lead to better decisions and improve innovations. But data quality isn't just about raw data; it also extends to metadata – the information about your data.
In this blog post, we will explore the different dimensions of data quality and how they relate to your metadata to ensure that it is trustworthy and fit for use.
Below are the 7 key dimensions that we will be focusing on and how they pertain to data quality:
Accurate data correctly reflects the real-world entities or events it represents. When data is accurate, it can be trusted and relied upon for critical tasks and reporting.
Complete data ensures that all required data elements are present and accounted for. Complete data lets us see the whole picture and use each element factor in our decisions.
Consistent data is uniform across different sources, systems, or periods. This element is crucial as it ensures that information and processes remain predictable and reliable, reducing the risk of errors and confusion.
Timely data is up-to-date and available when needed. Ensuring your data is fresh allows you to make decisions based on the latest information.
Relevant data is applicable and meaningful for the intended purpose. This can be seen as fit for purpose and help create a sense of context.
Data integrity involves data security and protection from unauthorized access or corruption. Confidence, peace of mind, and data integrity go hand in hand; having the proper access controls and privacy surrounding your organization's data is imperative.
Validity measures whether a value conforms to a preset standard: correct data formats, no duplicate records, correct data types, values conform to business rules.
Just as you want to ensure proper data quality, you must also work towards the proper quality of your metadata. In this section, we’ll review how these dimensions can be applied to your metadata.
When metadata is accurate, it represents the data correctly. This includes critical labels of information that is considered sensitive/
Take the example of metadata surrounding a column called SocialSecurityNumber; it is important that the metadata, a custom field, denotes that this is personally identifiable information. Failure to do so can result in data being mishandled.
Metadata must be complete to understand the data we need to see the whole picture. All relevant and required metadata fields should be curated.
Looking back at our column SocialSecurityNumber, there may be other valuable attributes to help gain an understanding of the column. For example, knowing it is PII is essential, but so is knowing it is a critical data element and restricted information in the customer domain. All these additional metadata components start to tell the whole story of the SocialSecurityNumber column. With the addition of the domain field, the customer starts to tell a different story than if the domain was an employee, for instance. Failure to see this picture can result in a slower time to context.
Metadata, much like data, needs to be consistent across various systems. Standardization is critical in the understanding process. Metadata should not differ significantly from source to source.
To use our example, the SocialSecurityNumber metadata in two different systems would share consistent attributes: both would be considered PII and a critical data element. However, this does not necessarily mean that all of the metadata would be identical. For example, in one system the domain may be ‘Customer’ and in another, it could be ‘Employee.’ Some metadata is contextual to the system in which it resides and how the field is used. Having proper policies is critical to create standardization and consistency across your metadata fields. This will help avoid confusion and allow users to understand their data more efficiently.
It is vital to ensure that your curation is up-to-date. Data is ever-changing; therefore, so is your metadata. This means we should not consider curation a one-and-done activity, but more as an ongoing exercise.
When looking at metadata for SocialSecurityNumber, we may see the vast majority of attributes describing this field are relatively static; however, attributes such as the assigned steward or owner may change regularly as the organization changes. These elements are equally essential to keep up to date; without having the proper, timely context in these areas, individuals risk making decisions based on what they knew yesterday, last month, or last year.
Relevance in metadata means tailoring it to specific contexts or purposes. Customize metadata to ensure it aligns with the needs of your data users. Looking back at our two domains, customer and employee, it is essential to have the correct metadata associated with SocialSecurityNumber to help create context about what the purpose or use of this field entails.
Data integrity extends to metadata as well. Implementing security measures to protect metadata from unauthorized access or tampering ensures that the right people have access to the right metadata to make decisions. It is important that only those with admin rights can set SocialSecurityNumber to “not PII.” Integrity surrounding your metadata creates a sense of trust crucial to usage.
In this post, we walked through what DQ dimensions are and how they can be applied in metadata; we use the example of a column in a table (SocialSecurityNumber) to help foster an understanding of the impact of metadata quality. These dimensions can also help kick-start your curation standards and allow you to be more prescriptive in stewardship efforts.
Although these dimensions can vary depending on who you ask, you can easily apply these specific dimensions to the metadata context. (The other two dimensions that are also often mentioned but are not included are validity and uniqueness.)
Curious to learn more about how a data catalog can help you improve your metadata quality and drive business results? Book a demo with us today.