In the technology industry, even the most incremental trends often get framed in next-generation terminology. And every megatrend produces its own new vocabulary. DataOps sprung up to connect data sources to data consumers. Tools became stacks. Architectures became fabrics. The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh.
Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. These data pioneers are looking beyond data warehouses, data lakes, and data lakehouses to envision the data needs of tomorrow by creating today’s modern data stack.
I pondered whether these megatrends — with their data meshes, data fabrics, and modern data stacks — were really brand new, or whether history may be repeating itself, albeit with new terminology. This led me to Sanjeev Mohan.
Mohan is a former VP Analyst at Gartner. He now puts his 30-plus years of industry experience to work at his eponymous consulting firm, helping companies use data and analytics to discover opportunities for market advancement and growth. He’s a true expert in the field, having worked at Oracle, Scient, BearingPoint, and Booz Allen Hamilton, and on data-focused projects with companies like LMVH, Major League Baseball, Toyota, American Express, Freddie Mac, and many, many others.
I recently had the opportunity to connect with Mohan at Snowflake Summit 2022 in Las Vegas. We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data.
Here’s an edited Q&A highlighting Mohan’s key points in this important conversation.
Data fabric, data mesh, modern data stack. How do you define these things?
Mitesh Shah, VP, Product Marketing, Alation: Data fabric, data mesh, modern data stack. How do you define these things?
Sanjeev Mohan, Principal, SanjMo & Former Analyst, Gartner: These three frameworks are intended to speed up our data infrastructure development and make it frictionless. Everyone is asking, ‘How do I quickly start getting value from all this data I’ve been accumulating?’ Everybody’s trying to solve this same problem (of leveraging mountains of data), but they’re going about it in slightly different ways.
Data fabric is a technology architecture. It’s a data integration pattern that brings together different systems, with the metadata, knowledge graphs, and a semantic layer on top. It helps you look at different sources of data together.
Data mesh says architectures should be decentralized because there are inherent problems with centralized architectures. For example, when we centralize, all the focus goes on the data engineers. But there are only so many data engineers available in the market today; there’s a big skills shortage. So to get away from that lack of data engineers, what data mesh says is, ‘Take those business logic data transformation capabilities and move that to the domains.’ The data producers should be responsible for quality, converting their data assets into products, and then ensuring they are accessible in a governed manner.
The modern data stack depicts this whole loop of how the data is produced and consumed. A modern data stack is basically a combination of these new products that have come together to help deliver analytics.
Why are they so popular?
Mitesh: These are big trends today. Why are they so popular?
Sanjeev: A lot of the reason for the popularity of these frameworks is because we’ve had an explosion of products; we’ve had an explosion of segments in the data stack. We should really be concerned about the business problem we are trying to solve [e.g. managing supply chains, providing better customer service, enabling doctors to make data-informed bedside decisions]. We, in our IT circles, get all caught up in these terms [thereby causing undue confusion]. But the business has a problem to solve, and they don’t really care whether we call it modern data stack or something else. We need to focus on the business challenge.
Customers are now saying, ‘Give us a framework in which we can operate.’ It’s not really prudent for IT to expect their customers to put it all together. A modern data stack gives a neat, closed-loop definition of what is needed. If products are well integrated [in a modern data stack], it makes the job easier for the customers to adopt it and solve their business problems.
Mitesh: Let’s talk about the trend toward decentralization with a data mesh. Decentralization pushes data responsibilities back to the folks who know the data best and can deliver high-quality data back to the data consumers. But it feels like one obvious disadvantage of decentralization is what it lacks: control, security, governance, and the ability to find and understand the data.
Sanjeev: It’s a fascinating world that we live in. We are never satisfied with one thing. We keep, like a pendulum, moving in a different direction. So if you look back, the modern data era started with mainframes. When the mainframes came out, everything was centralized. You had to buy an IBM mainframe, and you put everything on an IBM mainframe. But then PCs came out and decentralized the hardware. And then Windows came out and moved everybody onto a centralized operating system [for a decentralized client on a centralized server]. And then the Internet came out and we decentralized. And now with some of these cloud data warehouses becoming such behemoths, everything is getting centralized again.
The point is that we will constantly have this movement between centralized and decentralized. But decentralization helps us alleviate some of the bottlenecks that come with centralized architectures. In theory, it’s a good idea. But if you decentralize too much, every department, every domain is doing its own infrastructure with its own solutions. Then we run into issues with data that’s shared and common.
For example, I have customer data sitting across the shipping department, billing department, sales department, and marketing department. I need a way to have a single common business glossary, a business definition of that customer. But “customer” is an easy one. It could be gross margin. I cannot have multiple definitions of gross margin across different domains because they’re decentralized. So we have to be very careful about giving the domains the right and authority to fix data quality. When it comes to curating, governing, accessing, and securing the data, in my mind, that still needs to be centralized.
What are your thoughts on the centralization of metadata?
Mitesh: It sounds like the right answer is centralizing the right things. Those feel like security, governance, and metadata. What are your thoughts on the centralization of metadata? Should we accept a high degree of decentralization balanced by a modicum of centralization across these three areas?
Sanjeev: Metadata is the most exciting space right now. So many use cases hinge on getting metadata and getting it right – use cases including data quality & observability, data privacy, and even data security.
Let’s take data privacy as an example. Say the legal department says we need to deidentify all the email addresses. Well, how do I know where all the email addresses are? I go to my data catalog. The data catalog will identify, even for things that are named in very esoteric, technical terms, this is actually an email address so that they become more easily discoverable.
So metadata becomes the fuel for the data engine in today’s enterprise. And the data catalog is the tool that centralizes that metadata, analyzes that metadata, and helps people and systems make use of it.
Mitesh: Metadata is the fuel for the engine.
After my talk with Mohan, it was clear: Whether you have a data fabric, data mesh, modern data stack — or some combination of these and other trends driving your data strategy — metadata is the fuel for the engine that drives and provides visibility into it all. Alation increases the value of your metadata with machine learning, automation, and human knowledge. Learn more about active metadata management or join one of our weekly live demos to see how Alation helps companies tackle the most demanding challenges in data management.
Curious to learn how data mesh and fabric can power your modern data stack?
Join us on Monday, August 22, at 12:15pm PT / 3:15 p.m. ET at Gartner D& Summit in Orlando for our presentation, Alation: Helping Regeneron Power Drug Discoveries with Active Data Governance. You’ll learn how biotech company Regeneron partners with Alation to develop lifesaving treatments faster.