The best analysts are often the best data curators with deep, intimate knowledge of the data
There is a change that is happening as more companies are adopting self-service analytics for data discovery and moving away from a “command and control” structure for the data warehouse. As part of this change, there is an on-going shift in definitions and best practices for reporting and analytics.
In this new world of data curation, the number of people who can contribute knowledge is much greater than companies realize. The best analysts are often the best data curators, with deep, intimate knowledge of the data.
The question then becomes how can the data curator take input from everyone across the organization but make sure the input doesn’t scatter people in a 1,000 different directions? This challenge was faced as consumers went from the world of encyclopedias, where data knowledge was managed top-down, to the world of Wikipedia, where data knowledge is crowd sourced.
Inherently, there is a fundamental tension today between coverage and quality: which is worse, a page of data documentation being wrong or being blank? Intuitively, you can reduce the blank space by completely crowdsourcing and democratizing data curation, by leveraging automation to let computers make bulk edits, or by accelerating the speed at which your human resources document information. Normally, any of these moves will increase coverage at the expense of quality: non-experts and computers make mistakes, while bulk edits miss important exceptions. And experts make more mistakes when they’re going fast or doing a lot in a row.
This tension holds whether your approach to documentation and enforcement is prescriptive or descriptive. And while organizations might weight the relative risks of wrong or blank, both are bad for everyone. The opportunity is to get the best of both worlds: to have some coverage everywhere, from batch updates and rules, from lay people or from machine learning guesses, but to have conscientiously written, or at least confirmed, content from experts on the most important data sources.
Data curation should be a part of this shift. To get a successful data curation approach started in your organization:
Measure data curation success based on the usage of data, the type of data, and the criticality of data. Leverage a system that allows you to query data knowledge, for example, the number of times a particular table is accessed.
Clearly define the role of data curators and their domain for data curation. Identify what the data curator will do to ensure the structure and content of the data knowledge remains relevant and trusted. Transfer control of the data knowledge to the data curators so they feel self-empowered to take on the role.
Make the data knowledge easy for data curators to use in context. The data curators should have access to the information without having to log in to other systems to look up information. If information is right at the data curator’s fingertips, a reinforcing mechanism kicks in – they are informed at the right moment with the right information and have the right incentives to continue to update data as needed
A company’s culture and people are critical components in this advancing world of data curation. Additionally, technology must be go hand-in-hand with company culture and people to fully realize the potential of data curation in a data-driven world.