5 Ways Data Engineers Can Support Data Governance
By Pohan Lin
Published on January 26, 2023
Data is a key asset for businesses in the modern world. Used correctly, it can improve internal operations, power marketing strategies, and much more. That’s why many organizations invest in technology to improve data processes, such as a machine learning data pipeline.
However, data needs to be easily accessible, usable, and secure to be useful — yet the opposite is too often the case. Many users struggle to access the information they need or understand its full context once that access is unlocked. What’s worse, just 3% of the data in a business enterprise meets quality standards.
There’s also no denying that data management is becoming more important, especially to the public. Users are more concerned about how their personal information is gathered and used. This has spawned new legislation controlling how data can be collected, stored, and utilized, such as the GDPR or CCPA. This is why data also needs to be compliant.
These data requirements could be satisfied with a strong data governance strategy. Governance can — and should — be the responsibility of every data user, though how that’s achieved will depend on the role within the organization. This article will focus on how data engineers can improve their approach to data governance.
The Importance of Data Governance
Research from McKinsey reveals that employees in the average enterprise waste nearly 30% of their time on non-value-added tasks due to low-quality data. By contrast, employees in leading firms waste just 5-10% of their time on such tasks.
Lack of data governance can summon a whole range of problems, including:
Lack of consistency
For data to be useful, it should be consistent across all areas. A field might not be entered in the same way across different departments, which makes the data difficult to find and affects the accuracy of business intelligence (BI).
In many scenarios, there is no one responsible for data administration. Without proper quality control, data inaccuracies are more likely to occur. Again, this means that BI suffers.
Poor data categorization
Without a data categorization policy, important data is often overlooked. This makes it more vulnerable to being lost.
Ignoring all these issues only leads to their growth. Indeed, one of the biggest aggravators of data issues is inaction. In large companies, it’s not uncommon to encounter a general acceptance that data issues are too troublesome or massive to fix. However, the longer a problem is left unresolved, the bigger it becomes. Eventually, problems become too big to ignore — and more difficult to fix.
How can data engineers address these challenges directly? Let’s look at five different ways.
1) Appoint a Data Steward
A data steward is an actor in your data governance program. They ensure data is of high quality and fit for purpose for given roles and projects, and serve as critical connective tissue between the IT and the business. Many data issues are related to a lack of administration, and data stewards fill that gap.
The data steward’s job is to survey datasets and ensure they meet the applicable standards. They’ll ensure that users comply with the policies set by data governance committees. In this way, data stewards support, enforce, and realize a data governance program.
Below are some general guidelines for selecting a steward:
Employees that have previously worked with specific datasets are best suited to the role.
Data stewardship can be a full-time job. If somebody is already working in another role 40 hours a week, they’ll lack the time to do the job effectively.
Offer a framework
If your data steward doesn’t fully understand your policies, neither will the end users. Ensure that the data steward is given a full rundown of the information they need. It’s equally important that they know the reporting structure. Is there a steward group that they need to attend, or a governance committee?
Offer the right tools
Data stewardship is greatly simplified when the right tools are on hand. So ask yourself, does your steward have the software to spot issues with data quality, for example? Do they have a system to manage the metadata for given assets?
2) Always Remember Compliance
There are now many different data privacy and security laws worldwide. One example is the EU’s General Data Protection Regulation (GDPR). GDPR has had a huge impact on the ways businesses handle data, and organizations must now gather consent from a user before collecting any information about them.
Legislation also comes with large associated fines. In the case of GDPR, the maximum fine is either $20,000,000, or up to 4% of annual turnover. In other words, one compliance misstep can effectively shutter a business. Embedding data governance best practices into your data management strategy is essential to meeting modern compliance demands (and avoiding the costly penalties, both monetary and social, of violating these rules).
To stay on top of external requirements, there needs to be awareness throughout your organization. Leaders should constantly remind teams of their legal obligations when handling data. Provide learning resources and training, even if it requires investment.
3) State — and Communicate — Your Goals
Establishing a set of overall objectives is an essential part of data governance. After all, your governance strategy should be based around your goals. A clear set of goals allows you to monitor your progress and identify whether additional steps need to be taken.
Below are some examples of common data governance goals:
All data collection, storage, and usage must meet the terms of legislation. Avoid fines that could result from issues such as data leakage or lack of data minimization practices. (This is “table stakes” for any data governance program!).
Ensure a secure framework for data storage. Take all steps possible to minimize the damage from cybersecurity-related threats.
Create a clear set of rules that govern the usage of data. Carry out the necessary steps to educate staff to follow rules.
Make sure that the value of data increases over time.
Reduce the overhead costs of data storage and management.
Consider different data warehousing concepts to improve data quality, insights, and accuracy.
4) Build a Secure Framework
I’ve already discussed the threats posed by cybercriminals. In fact, 66% of small to medium-sized businesses have experienced a cyberattack in the past 12 months. This trend isn’t likely to reduce, so having the appropriate measures in place is essential.
The following measures are a must for any business seeking to secure its data:
Choose the right system
To ensure security, you need a framework that adapts with the times. The IBM mainframe system is a strong example. Since its inception in 1952, the system has evolved to offer the best capabilities and security.
Remove all identifiers that could link individuals to stored data. Examples of personally identifiable information (PII) include contact information, financial details, and patient identification numbers.
Ensure that access to data is granted only on a need-to-know basis. Adopt an approach of access segregation. This means that different access policies are applied to different sets of data.
Enable two-factor authentication
Two-factor authentication adds an extra layer of security to your system. Even if a password is entered correctly, a user will need to enter an additional code to log in.
Have a data back-up strategy
If data is lost, it could be unrecoverable. To mitigate the damages related to data loss, be sure to back up information regularly.
Invest in tech
Achieving the points listed above will require the correct security software. For instance, the Clover DX anonymization tool can strip data of sensitive components.
5) Identify Critical Data
Which data is essential to run your organization? Answering this question is critical to effective data governance.
This essential data is referred to as critical data elements (CDEs). By identifying CDEs, you can better prioritize tasks and gain a more thorough understanding of your strengths and weaknesses. In other words, determining your CDEs is an essential part of creating a streamlined organization. Bear the following points in mind when doing so:
Consider critical data based on function
You’ll have data that is essential for optimizing your customer support. Similarly, you’ll have data that is integral to internal processes, such as SOX controls. If you’re looking for a SOX controls definition, this is a system that lets companies have control of financial reporting. Both serve completely different purposes, but both are equally important.
Narrow the scope
It’s tempting to mark huge swaths of data as critical. But for categorization to be useful, you need to be as narrow as possible. Ask yourself, would certain processes function without this data? If so, it shouldn’t be classed as a CDE.
Different departments will know what data is critical for them. Make sure to involve all key stakeholders to ensure that classification is a cross-department effort. This is vital for getting the best picture of your CDEs.
Use the right framework
The task of categorizing CDEs is made much simpler when you have the right framework. For example, if you build a pandas DataFrame, the process of analyzing data is much more efficient.
Data is essential for the modern business to operate and stay ahead of the competition. However, data alone is not enough to ensure success. It needs to be easily accessible, high quality, and secure. Achieving this is impossible without strong data governance.
The five tips explored within this article should be the basis for any governance strategy. Have you appointed a data steward to ensure proper administration? Is your data collection, storage, and usage compliant with external legislation? And perhaps most importantly, is your data secure enough to be protected against threats?
Every effective data governance strategy begins with a set of goals. These should be broad and cover the various aspects of data usage by your organization. Finally, remember to categorize your CDEs to enable a streamlined and effective operation.
Strong data governance can transform your organization. Why not examine your processes and see if any improvements need to be made?
Author Bio: Pohan Lin – Senior Web Marketing and Localizations Manager
Pohan Lin is the Senior Web Marketing and Localizations Manager at Databricks, an AI provider connecting the features of TensorFlow Python, data warehouses and data lakes to create lakehouse architecture. With over 18 years of experience in web marketing, online SaaS business, and ecommerce growth. Pohan is passionate about innovation and is dedicated to communicating the significant impact data has in marketing. Pohan Lin also published articles for domains such as PingPlotter and IT Chronicles.
- The Importance of Data Governance
- 1) Appoint a Data Steward
- 2) Always Remember Compliance
- 3) State — and Communicate — Your Goals
- 4) Build a Secure Framework
- 5) Identify Critical Data