What is Data Quality?

Data quality is defined as:

the degree to which data meets a company’s expectations of accuracy, validity, completeness, and consistency.

By tracking data quality, a business can pinpoint potential issues harming quality, and ensure that shared data is fit to be used for a given purpose.

When collected data fails to meet the company expectations of accuracy, validity, completeness, and consistency, it can have massive negative impacts on customer service, employee productivity, and key strategies.

Why Is it Important to Have Data Quality?

Quality data is key to making accurate, informed decisions. And while all data has some level of “quality,” a variety of characteristics and factors determines the degree of data quality (high-quality versus low-quality). Furthermore, different data quality characteristics will likely be more important to various stakeholders across the organization.

A list of popular data quality characteristics and dimensions include:

  • Accuracy
  • Completeness
  • Consistency
  • Integrity
  • Reasonability
  • Timeliness
  • Uniqueness/Deduplication
  • Validity
  • Accessibility

Because data accuracy is a key attribute of high-quality data, a single inaccurate data point can wreak havoc across the entire system.

Without accuracy and reliability in data quality, executives cannot trust the data or make informed decisions. This can, in turn, increase operational costs and wreak havoc for downstream users. Analysts wind up relying on imperfect reports and making misguided conclusions based on those findings. And the productivity of end-users will diminish due to flawed guidelines and practices being in place.

Poorly maintained data can lead to a variety of other problems, too. For example, out-of-date customer information may result in missed opportunities for up- or cross-selling products and services.

Low-quality data might also cause a company to ship their products to the wrong addresses, resulting in lowered customer satisfaction ratings, decreases in repeat sales, and higher costs due to reshipments.

And in more highly regulated industries, bad data can result in the company receiving fines for improper financial or regulatory compliance reporting.

Other Challenges with Data Quality

Data volume presents quality challenges. Whenever large amounts of data are at play, the sheer volume of new information often becomes an essential consideration in determining whether the data is trustworthy. For this reason, forward-thinking companies have robust processes in place for the collection, storage, and processing of data.

As the technological revolution advances at a rapid pace, the top three data quality challenges include:

1. Privacy and protection laws
The General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), which gives people the right to access their personal data, are substantially increasing public demand for accurate customer records. Organizations must be able to locate the totality of an individual’s information almost instantly and without missing even a fraction of the collected data because of inaccurate or inconsistent data.

2. Artificial Intelligence (AI) and Machine Learning (ML)
As more companies implement Artificial Intelligence and Machine Learning applications to their business intelligence strategies, data users may find it increasingly difficult to keep up with new surges of Big Data. Because these real-time data streaming platforms channel vast quantities of new information continuously, there are now even more opportunities for mistakes and data quality inaccuracies.

Furthermore, larger corporations must work diligently to manage their systems, which reside both on-premises and through cloud servers. The abundance of data systems has also made the monitoring of complicated tasks even more challenging.

3. Data governance practices
Data governance is a data management system that adheres to an internal set of standards and policies for the collection, storage, and sharing of information. By ensuring that all data is consistent, trustworthy, and free from misuse within every company department, managers can guarantee compliance with important regulations and reduce the risk of the business being fined.

Without the right data governance approach, the company may never resolve inconsistencies within different systems across the organization. For example, customer names can be listed differently depending on the department. Sales might say “Sally.” Logistics uses “Sallie.” And customer service lists the name as “Susan.” This poor-quality data governance can result in confusion for customers that have multiple interactions with each department over time.

How to Determine Data’s Quality

The Data Quality Assessment Framework (DQAF) is a set of data quality dimensions, organized into six major categories: completeness, timeliness, validity, integrity, uniqueness, and consistency.

These dimensions are useful when evaluating the quality of a particular dataset at any point in time. Most data managers assign a score of 0-100 for each dimension, an average DQAF.

Completeness is defined as a measure of the percentage of data that is missing within a dataset. For products or services, the completeness of data is crucial in helping potential customers compare, contrast, and choose between different sales items. For instance, if a product description does not include an estimated delivery date (when all the other product descriptions do), then that “data” is incomplete.

Timeliness measures how up-to-date or antiquated the data is at any given moment. For example, if you have information on your customers from 2008, and it is now 2021, then there would be an issue with the timeliness as well as the completeness of the data.

When determining data quality, the timeliness dimension can have a tremendous effect — either positive or negative — on its overall accuracy, viability, and reliability.

Validity refers to information that fails to follow specific company formats, rules, or processes. For example, many systems may ask for a customer’s birthdate. However, if the customer does not enter their birthdate using the proper format, the level of data quality becomes automatically compromised. Therefore, many organizations today design their systems to reject birthdate information unless it is input using the pre-assigned format.

Integrity of data refers to the level at which the information is reliable and trustworthy. Is the data true and factual? For example, if your database has an email address assigned to a specific customer, and it turns out that the customer actually deleted that account years ago, then there would be an issue with data integrity as well as timeliness.

Uniqueness is a data quality characteristic most often associated with customer profiles. A single record can be all that separates your company from winning an e-commerce sale and beating the competition.

Greater accuracy in compiling unique customer information, including each customer’s associated performance analytics related to individual company products and marketing campaigns, is often the cornerstone of long-term profitability and success.

Consistency of data is most often associated with analytics. It ensures that the source of the information collection is capturing the correct data based on the unique objectives of the department or company.

For example, let’s say you have two similar pieces of information:

  1. the date on file for the opening of a customer’s account vs.
  2. the last time they logged into their account.

The difference in these dates may provide valuable insights into the success rates of current or future marketing campaigns.

Determining the overall quality of company data is a never-ending process. The most crucial components of effective data quality management are the identification and resolution of potential issues quickly and proactively.

Data Quality Management Tools & Best Practices

Data is generated by people, who are inherently prone to human error. To avoid future problems and maintain data quality continuity, your organization can adopt certain best practices that will ensure the integrity of your data quality management system for years into the future. Such measures include:

  • Establish employee and interdepartmental buy-in across the enterprise.
  • Set clearly defined metrics.
  • Establish data governance guidelines.
  • Create a process where employees can report any suspected failures regarding data entry or access.
  • Establish a step-by-step process for investigating negative reports.
  • Launch a data auditing process.
  • Establish and invest in a high-quality employee training program.
  • Establish, maintain, and consistently update data security standards.
  • Assign a data steward at each level throughout your company.
  • Leverage potential cloud data automation opportunities.
  • Integrate and automate data streams wherever possible.

Alation provides a variety of enterprise-level tools and solutions for the implementation of cost-effective data quality management systems. We help organizations consolidate siloed and distributed enterprise data, build consistency in data practices, and improve both the speed and the quality of the decision-making process. For more information on our data quality management solutions, contact Alation today.

Alation state of data culture report q1 2021