Why AI Builders Need a Metadata Goldmine with Chris Aberger, VP at Alation

Chris Aberger, VP, Alation

Chris Aberger

Chris Aberger is VP at Alation and former CEO and co-founder of Numbers Station AI, where he pioneered AI agents for data workflows. He previously led machine learning at SambaNova Systems and held roles at Google, Apple, and IBM. Chris holds a Ph.D. in Computer Science from Stanford and multiple engineering degrees.

Chris Aberger

Alation

Satyen Sangani

As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”

Satyen Sangani

CEO & Co-Founder

Alation

0:00:05.1 Satyen Sangani: Welcome back to Data Radicals. Today's episode is special. We're joined by Chris Aberger, the co-founder and former CEO of Numbers Station AI, and now, Alation's newest VP following our recent acquisition of the company. This conversation reveals why we've joined forces and our shared vision for enterprise AI. Together, we're tackling a critical challenge, unlocking the potential of structured data with large language models. Chris shares the origin story of Numbers Station and the fast-paced innovation that fueled its rise. He also reveals why metadata is key to AI and what that means for the powerful synergy that forms this partnership. If you're curious about where AI is headed and how it can make structured data more usable, intuitive, and impactful, this is an episode for you.

0:00:54.4 Speaker 2: This podcast is brought to you by Alation, a platform that delivers trusted data. AI creators know you can't have trusted AI without trusted data. Today, our customers use Alation to build game-changing AI solutions that streamline productivity and improve the customer experience. Learn more about Alation at Alation.com.

0:01:12.3 Satyen Sangani: Today on Data Radicals, I am thrilled to welcome Chris Aberger, newly minted VP at Alation, though most recently, Chris serves as the CEO and co-founder at Numbers Station, a startup pioneer for building AI agents for data workflows. He was also the senior director of machine learning at SambaNova Systems, where he built the AI team from the ground up. He has work experience at Google, Apple, and IBM, and this is a super exciting time for Alation because we just acquired Numbers Station, and we're joining forces to really make a new generation of structured data, and AI, and workflows and literally every buzzword that we're gonna actually deliver on and make real. Chris, welcome to Data Radicals.

0:02:00.0 Chris Aberger: Hey, Satyen. Thank you so much for having me on the show. Excited to talk to you today.

From the Stanford AI Lab to founding Numbers Station

0:02:04.7 Satyen Sangani: So, maybe we'll just start with Numbers Station. You founded it along with a couple of other folks. Tell us about the story. Tell us about how it started and why you started it.

0:02:14.4 Chris Aberger: Yeah, so it was back in early 2021. I was at SambaNova at the time, and I'd always kind of... So I met Sen and Ines, my other co-founders, and Chris Rey back at Stanford when Sen and Ines were doing their PhDs with me, and Chris Re is all of our advisors. He's a professor at Stanford.

And so we all met back at Stanford and back in early 2021. This was pre-ChatGPT and kind of all the AI and LLM hype that's come out since then. We saw this trend with foundation models and large language models coming, and Sen and Ines had written and pioneered some early work on taking structured data and applying that to LLMs. And I read that paper, kind of circled back with those folks and was like, wow, there's really interesting opportunity here to build a layer that sits on top of the LLM.

So we're not training these models ourselves, but to build out this layer that sits on top of the LLMs that makes it easy for organizations and data organizations in particular to build out applications on top of their classic databases and structured data. So that was kind of the origin of it was this paper that Ines spearheaded back at Stanford when she was doing some research. I was kind of all circling back together after many years. They prototyped some ideas around this and then eventually got enough conviction in the fall to say this is something that we have to do and took the leap of faith and decided to start the company at that point.

0:03:48.2 Satyen Sangani: So from the point that you read the paper to the point of founding the company, how many months was that?

0:03:52.9 Chris Aberger: That was about six months, maybe eight months. I read it before it was released. I believe it was released in like May of '21 and we formally started the company in October. So it wasn't a huge time frame.

0:04:07.2 Satyen Sangani: And then you went straight to basically raising your first seed round of capital from Chris and his crew?

0:04:13.3 Chris Aberger: Yeah. So we raised our first round of capital late in 2021 and we're kind of off to the races on building out the company since then.

0:04:22.8 Satyen Sangani: Yeah. And then the first time we met, I can't remember, I think it was 2023, right? Or was it '22?

0:04:29.1 Chris Aberger: I think it was '23. Yeah, it sounds right.

0:04:32.3 Satyen Sangani: Yeah, so the story and the funny connection there is that Venky Ganti, who was, of course, one of Alation's co-founders, journeyed through a couple of different places, his own startup, then Google, and then most recently Numbers Station and now is elsewhere. But Venky was your head of product and he was like, Satyen and I've got to show you this thing. And he did at the time, which was super cool. And I guess then reintroduced us about, oh my God! What? 90 days ago?

0:05:05.2 Chris Aberger: Yeah, not too long ago, I think a couple months ago. Yep.

0:05:09.6 Satyen Sangani: So tell me about the journey between sort of, I guess, you're founding the company, getting your initial funding, and what did you learn along the way of building Numbers Station? And what was the journey like? And like all startup journeys, and you're still on it on some level, what were the ups? What were the downs? And what you learned?

Lessons from the startup journey: The need for speed (and finding your audience)

0:05:28.8 Chris Aberger: Yeah, I mean, like all startups, it's a roller coaster as the kind of first and foremost thing. So learned a ton of stuff along the way. The vision for the company has always been the same, right? This foundational layer that sits on top of these foundation models and large language models and largely targeted for structured data. But a lot of the things that we did on top were like quick iterations and testing directions and market and where we think we can actually build a sustainable business. And so kind of the first approach that we took was actually looking at data transformations because it's like the most gnarly problem. I would say, that's out there in terms of getting your data model correct is like step zero to do anything on top of your structured data. So we were like, academics like, oh, this is like the toughest technical problem that's out there. Like, clearly we should just go solve this. And we really like dove in, solved that. I think it was about half a year, had our first product released. We were getting decent traction, but we were finding was really tough to turn that into actual dollars coming into the business.

0:06:30.9 Chris Aberger: And it makes sense, right? As you kind of took a step back, it's like, oh, if you go out and look at these organizations, who actually controls the budget and what they want to get done, enabling their data engineers to be slightly better is a decent value proposition, but actually solving that executive purchaser's end level problems is a much more attractive one. So we kind of up-leveled after that first initial product iteration into actually solving that purchaser's end-level problems. And where that started was actually with this kind of canonical chat with your data style application. So you think of like ChatGPT on top of a database and use that as a starting point application. But the idea is really to be extensible and broaden out into a variety of different workflows on top of structured data and actually solving that end business user's problem or executive purchaser's problems.

And so that was, you know, it was a lot of like testing the market, I think really iterating quickly. One of my favorite quotes that I learned early on was like, when in doubt, iterate faster. And I'd say like the only regret I would say at Numbers Station or like the main regret I would say I had is like, I just wish we would have iterated faster as you go back and kind of look at the company, even though I do believe we iterated extremely quickly. So I'd say that's like the number one lesson that I learned. And yeah, the data transformation stuff, we still use it, right, to get that data model in the correct form for what we're doing on top of these databases. But there's a lot of evolutions of the product to get to the state that it's in now.

0:08:07.7 Satyen Sangani: Yeah. And I mean, this is one of the observations that Venky made about you as well. And he's like, this is this team just moves so fast. And I remember the early Alation founding team. One of my co-founders was a guy who we all know his name is Feng Niu, who now runs his own company. And Feng was just like, you know, we, the three of us, I mean, we tell these funny stories where Aaron, who's one of the who's the fourth co-founder, and I would just debate things endlessly. And, you know, I don't code. So, like, you know, what else could I do? But like, you know, pontificate about crap. And so he would he would basically we would talk, we would argue. And then Feng would be like, here, I've already hacked it up. It's done. And and it felt like you guys had literally a team that just did that all day long. I mean, it was not just obviously you who who had the capability to hack, but but Ines and Sen and all the other folks on the team who had joined over time, which was just an incredible superpower.

What does it mean to chat with your data?

So you built this chat with your data interface, but your chat with your data was different. Like we've actually seen, you know, in our travels, got pitched by a lot of chat with your data companies. Everybody wants to do chat with SQL. There are lots of papers out there talking about the efficacy of that. You guys took a slightly different tack, though, because the chat was an interface, but it was an interface as a window to not just a sort of SQL authoring agent or a querying agent, but you had other agents. Tell us about that path. How did you get to that discovery, and what would you do to get there?

0:09:35.7 Chris Aberger: Yeah, so I think I've always had this like love-hate relationship is probably the best way to put it with with chat with your data. I think one of the things with chat with your data is there's a ton of noise out there in the market. It's super easy to rip up a text to SQL prototype. In fact, you don't even need to rip anything up. You can just talk to open AI and it'll spit out SQL for you. So there's not a huge technical barrier in getting like a proof of concept running. But I think it's largely still an unsolved problem that everyone thinks is solved. So it's like you're kind of in this awful death trap, I would say, if you're just solving that problem from my perspective. And the second thing is, I don't think chat is where things end in this space. Right. Especially, you know, I think this has been validated right in the era of agents. But our idea was always like get something done. So having a conversation is nice. Finding an insight is nice. But that's really always been a stepping stone for us in terms of building out other agents that enable people to capture full workflows and actually get stuff done on their data.

0:10:36.9 Chris Aberger: So I think it's a very easy, digestible application, kind of first step that you can take on top of the database. That's why we started there. Like everyone understands it. There's a need in most organizations to kind of lower the barrier to entry to get insights on top of your data sets. So all that's good. It's an entry point into these organizations. It's a way to establish trust. It's a way to also train our AI systems to become more acquainted with this organization. But every company that signed up to work with us did not sign up just for that chat capability, even though that's where they started. They signed up for the longer term path to solve end level workflows, right? Whether that's in commercial real estate, fitness industry, whatever it might be, they actually wanted to get things done on their data, not just serve up an insight or explore what's going on over their data. And so that's always been the hypothesis. I think it really got validated when all the agent hype kind of picked up because we were a little bit ahead in that we had an agentic platform from the start. And then it kind of validated that, okay, it's clear how all this stuff becomes immediately extensible and how you can build really interesting systems on top of structured data. So that was kind of the evolution of the business. I think a lot of people that look at us at a high level were like, oh, you're just doing text to SQL, which often was our motion to go work with companies. But really what we were doing from a foundational level was actually much different than just that application.

0:12:02.9 Satyen Sangani: And talk about some of those differences. Talk about sort of what else you built besides sort of this text-to-SQL capability.

From chat with your data to act with your data: From data users to business builders

0:12:10.3 Chris Aberger: Yeah, I think even just, so let's just like zone in on the text to SQL and how we're different there. It's like the first realization, even if you're doing chat with your data that we had, and I think what a lot of companies were taking in this space was that, oh, you do everything kind of net new. Like you talk to your new whatever chatbot that you're deploying in your organization. This will emit all the net new SQL and like you're good to go. Go throw away your BI tools that you were using in the past and use this new tool. That was not our stance. In fact, like we looked at who is having a lot of success in the AI space and we saw Glean and early on we were saying we're Glean for structured data, right? So we want to plug in and actually operate across a bunch of different systems. If the answer already exists in Tableau, Power BI, Looker, documentation, wherever it might be across your organization, we want to actually pull from that, not do something net new in the database. Don't reinvent the wheel if you don't have to. But the first thing there is that kind of connectivity into a bunch of other systems.

0:13:11.8 Chris Aberger: And the way that you do that connectivity is through a multi-agent framework. So we always had that kind of architectural thing set up. But still, we're talking about kind of chatting and getting insights from your database still, even though we're talking about plugging into a bunch of systems. But what that set us up for is the ability to eventually go and take actions in those systems too, right? So you can think about plugging into the simplest example here would be like producing a PowerPoint slide, right?

So I have an insight, but now I need to serve up that insight to my boss. I'm going to automatically produce a PowerPoint slide. Something more complicated is I'm going to look for one of our largest customers is in commercial real estate, JLL. They've been a fantastic customer to us. You could look at things like work orders over a property and say like your air conditioner is breaking down or there's a heat wave. I want to proactively go find vendors to do maintenance on my air conditioning unit ahead of this heat wave. Go search in my area, find these vendors, send me an email with the list, maybe even go out and contact them directly. But that's actually going into the end state of taking actions, not just surfacing a pretty bar chart that's giving me an insight over what's going on in my data. And that's where I believe the space is going to more and more tilt over time.

0:14:29.8 Satyen Sangani: Yeah, I couldn't agree more. I mean, I think we were late to have relative to you and maybe to others, but probably not the world to that realization. I mean, the way you think about the last decade of building data tools, you get this proliferation of productivity-based capabilities, whether that is BI tools or the ability to build ETL better or the ability to be able to build ETL in different ways and different modalities or even the databases themselves. And then, of course, in our space, you've got the catalogs and the governance capabilities, but all of it has been off to the side. Like it's been largely stuff that makes other people productive. The end is a report. The insight then feeds a decision and people are doing these things sort of in two different planes. You're making a decision and then you're doing something.

And I think the two things that seem to be happening today or that are happening today is that data teams are under this assault. There's like, look, the secular investment in all of these tools has better or proven some ROI or you're not going to get funded. So there's this really clear budget motivation. But on the flip side, there's also this issue, which is that Gen AI is eating everything. And what I think, to your point, this means, and you said this so eloquently, is that these data teams have to become builders. These data teams have to actually go from just intellectually solving the problem to actually solving the problem. And you can view that as a threat, but I think you and I both see this as an incredible opportunity.

0:16:01.1 Chris Aberger: Yeah, I mean, I think the advent of like ChatGPT and all these GenAI tools has been crazy. I even go talk to my mom and she's doing things on the computer that she couldn't dream of before, whether it's like editing images and like, it's very basic things to be clear. But my mom believes with these tools like ChatGPT, she's turning into more of a quote-unquote builder. Sorry, mom, don't mean to offend you. Using these tools.

And we see that kind of bug being bitten in all the data teams that we're talking to, where they now have this belief: Hey, I can actually go out and build a fair amount of this myself using these tools or on top of something like OpenAI. But what they find when they're doing this very quickly is that it's easy to get a prototype up. It's very hard to get it into production. One of the quotes that I always like in this space is it's very easy to get started. It's tough to get right. And so oftentimes when we are talking to teams at Numbers Station, we wanted to go after these builders because, again, we have this platform that we want to make it easy for people to build on top of.

0:17:07.5 Chris Aberger: And going after these teams that were building, often many times they've tried it and then realized that it was really hard and that they needed to partner with someone to kind of get that last mile and to get it into production. The interesting part about that kind of partnering and getting into production is we would build these agents out on top and we have a bunch of ML accounts. We actually are able to rip that up fairly quickly, these different agents on top. But where the real kind of moat and difficulty came in the space, there's actually two parts of it, was on the metadata side. So we had a concept that we called a knowledge layer. You can think about this as a combination of a knowledge graph and a semantic layer. It's actually the same thing as Alation's data product, which is a very interesting mashup that we kind of saw early on when we were talking here. But almost like 80% of our time at Numbers Station was not spent building those agents. It was spent getting that metadata correct, right? Working in these organizations, going in, trying to get access to query logs, trying to figure out how to mine descriptions of columns, trying to figure out everything in between, look at documentation, etc. How do we get this metadata into a reasonable state?

0:18:19.3 Chris Aberger: Because that was actually the key thing to making these agents work well in production. And I think one of the things that clicked for me, I think even in our first meeting in '23 or '24, when Venky was at our organization, I remember having these conversations with him. I'm like, dude, Alation is sitting on a goldmine, right? For all the metadata that I have in their organization, because we are like, crying tears and sweating a ton of bullets to go into these organizations and curate this metadata. You guys already have it in a lot of cases. And so that was always something that was really interesting to me about partnering with Alation was the fact that we can almost like jumpstart or take a shortcut, right? And working in these organizations to access all this goldmine of metadata, which is actually the key thing you need to make these agents work well on top. So long-winded answer there, but that's kind of been our tour of building out agents from chat all the way to workflows and getting things done. For all the applications, what it came down to was metadata in the end.

The value of metadata to production-ready AI

0:19:23.5 Satyen Sangani: Yeah, I think that's absolutely right. And I would also say that metadata in and of itself has been the challenge of on some level BI and databases, probably for the last 30 or 40 years. I mean, I remember when I first started working at Oracle, my mentor at the time was building these like metadata-driven apps to do these deterministic calculations on data. And in his case, he was like, it's all about the metadata. And this was like back in 2006. And people have been talking about that since the days of like business object and everybody's been trying to get the metadata right. I think what's exciting about what you guys do is that the cycle time. So it goes back to this entire premise of speed, which is that the cycle time of improving the metadata, getting to an outcome, being able to do something, seeing the impact of that thing, going back and correcting the metadata so I can even be more powerful to do the next thing that I want to do is I think the loop that you have to create. And to me, that's what I think is so exciting about this combination, which is that, we've got this metadata, but it's off to the side. And you've got these applications and these agents that actually give me the power to do something with it. I can go build a presentation. I can go write a query. I can go do these things that like give me so much more power and capability than I otherwise had. And I can do them right. And I think that dual benefit is just, I mean, I remember when I first met you the second time around, or like 90 days ago when we first talked.

0:20:51.9 Chris Aberger: The second first time.

0:20:52.7 Satyen Sangani: The second first time. Yeah, it felt like it's funny because as I reflect back to the first meeting, I was like, ah, this is really interesting. I think there's something here. But like, I clearly was like not totally getting it as quickly as I ought to have. But it did feel like the second time around, we were just completing each other's sentences. And I think largely because we got to the same place through different paths.

0:21:12.7 Chris Aberger: Yeah. I mean, I probably didn't get it completely the first time we talked to you. So it's probably both of us kind of converging on similar conclusions here. But yeah, to your point, like, solving that end level use case and then having that feedback loop back into correct the metadata in terms of how these agents are working and being organized underneath the scenes. Absolutely crucial. Right. And I would say the kind of weird thing that we saw, just to go on another tangent for what's happened at Numbers Station was we actually had some, we started the company, I forget, with like five or six ML or AI PhDs. A ton of firepower in the AI space. And what we had spent our PhDs doing was actually training models. You go collect the training set, you go train a bunch of floating point numbers effectively underneath the scenes, watch some loss curves, and make sure that everything's performing well. But the space completely shifted for AI talent and what you're doing, for the most part, in 2021 when these large language models came out, in that, well, nowadays, there's actually only like maybe four or five companies in the world, I would argue, that should be training these models, OpenAI's of the world, et cetera.

0:22:23.9 Chris Aberger: But the AI or ML talent at the remaining companies actually should be spending their time working on these feedback loops. It's almost like a form of reinforcement learning on this metadata, but it's how do you work on these feedback loops. So all this is a long-winded way of saying there's actually been a transition for AI and ML talent outside of kind of those model providers in terms of what they should be doing in organizations, and that it's largely a lot of prompt engineering, and then building these feedback and evaluation loops into these agents themselves, which is a tilt, right, for kind of what we were doing in our PhDs 10 years ago. So, I almost call this like the identity crisis of AI engineers, they feel like because they spent their PhDs training these models, they still need to be training these models. And it's like, no, no, no, to go solve these end-level applications, actually, in most cases, you shouldn't be touching the weights of these models. What you've got to be getting right is this metadata and evaluation feedback loops. And if you can nail that, that's how you have success and solve end-level applications in the AI space. So super interesting problem, ton of interesting things to solve there, and really has been a huge focus for the majority of Numbers Station in terms of what we've been looking at.

Metadata as a living organism

0:23:37.9 Satyen Sangani: Yeah, and this is like one of the things that I think we've seen in greater resolution with you guys, like there's so many variables to any given problem. So like there's the model and which model you choose to use. And there are some that are obviously more powerful, but more expensive and cheaper and smaller. And, you know, there's open and closed and whatever, there's that entire thing. And then there's this entire question of like, well, okay, great, but what's the data that I'm working with? And what is the interaction between that model and this data? And how much metadata do I need to actually to make these things actually allow these things to talk to each other?

But then there's also the use case, which is like, what am I actually trying to do? Because if I'm just trying to answer a very simple question, that's one thing. If I'm trying to do something very complicated, I might need a whole different level of metadata. And to your point, the metadata is the thing that actually sort of does all the translation. But it's not a fixed outcome that you can deterministically know. It's literally always this thing that you have to find and evolve and test. And so it's a lot more like software testing and evolution than it is like sort of just like data modeling, where in the old world of data modeling, we basically go say, oh, you have a star schema. Great. It's answers some set of questions. Now you're going to like live that, have that thing live for a couple of years.

0:24:51.0 Chris Aberger: It's a living organism.

0:24:52.7 Satyen Sangani: It's a living organism. And so the real question is, how do you set up that feedback loop that you're pointing to? We have a guy here at Alation who literally every paragraph, he's going to say the word flywheel, but how do you set up that flywheel? Shout out to Jonathan Bruce, but like, no, it's a funny thing. And I think building these apps is what I'm super, I think so excited about because I feel like the power is going to be really incredible, although it's going to be hard. I mean, I think there's going to be lots of hard technical challenges and lots of hard user interface challenges too.

Now, I guess everybody in the world seems to see this at the same time, because right before we announced, or actually before or after, before we announced the acquisition of Numbers Station, ServiceNow was going to announce the acquisition of Data.world. And then I guess a week after Salesforce announced its acquisition of Informatica. Talk a little bit about that. Like, we're in this moment where metadata is super interesting. You know, we're going up the stack. Those guys are coming down the stack. I don't know that I even say that we compete against each other, but certainly it's a lot in the same domain and space. How do you think about that?

0:25:58.3 Chris Aberger: Yeah, I think it's interesting. I think a lot of people are coming to interesting realizations here. One thing just to take a step back on how I view this space. So actually if you look like four or five years ago, again, pre these LLMs, the difficult and sexy problems where you saw a lot of big acquisitions getting done, I think even Feng had one in this space, was around unstructured data. So it was like, how do I take this web of dark unstructured data and turn this into something usable? And it was like a huge problem in the AI and ML space and the enterprise data problem. And it still is, to be clear. I'm not claiming this is a solved problem. But I do think the seesaw here between unstructured data and structured data has actually shifted recently. And why I say that is if you look at how these LLMs are trained, they're actually, you go and take all the unstructured data from the web and train over that. And so these LLMs are actually well-suited for unstructured data. Do they work perfectly out of the box for unstructured data? No. But they are much better suited for unstructured data than they are for structured data.

0:27:02.9 Chris Aberger: And I think over time, more and more of the unstructured data like PDF parsing and all the unstructured data that you might have in an organization is going to be commoditized by these LLM or model providers. But what I do think people are realizing lately or what's kind of become the cool problem to look at or people at least realize it's hard is making the structured data useful. These LLMs are not trained on structured data.

So again, I can rip up a prototype on OpenAI very quickly to show you talking to my database or doing something on top of my database. But when you go into a Fortune 100 company with really complex enterprise structured data, good luck trying to get that to work well by just using a model plugging into that. And so, I think there's a couple of things that have gone on here. First is people have realized that, okay, structured data is actually the hard problem to get right. And all these organizations' really valuable data is inside their databases in a structured format. So we have to figure out how to make this ready for the AI era. And then the kind of second level problem that people are discovering is, how do I make this structured data actually work? Oh, it's metadata, right? And I think that realization of that kind of two-step realization is what's causing a lot of this activity that we're seeing in the market, which is I know I need to plug into databases. I'm now coming to terms with the fact that this is actually a really tough problem to get right. In order to get it right, I need to effectively go build a data catalog or metadata provider. And therefore, we're seeing a lot of activity in this space, at least from my biased perspective.

What are precision agentic workflows?

0:28:46.1 Satyen Sangani: Yeah. I mean, and as in sort of our David Chao, who runs marketing for us, called it this sort of precision agentic workflows, which I mean, two of those words are used by literally every vendor to talk about every single thing, agentic and workflows, because why not? But I do think this precision thing is the entire fundamental point. Like you've got these like lossy stochastic LLMs that are prone to hallucinations, then you've got this structured data that has to be perfect and precise. And you can't be like, oh, it might be a debit or a credit, or it might be like positive or negative, or it might be like a customer or a vendor. You have to know exactly which one it is. And that's, I think, the opportunity that we have in front of us, which is to make these models talk to this structured data, both on reading and on writing. And what I think is really fun is to watch the progress, that like the speed of the progress that we're making together. I mean, I feel like we were moving at a pace for the last six to seven months that was fast. You guys were obviously doing so as well. And I feel like we've almost weirdly accelerated each other, which you almost never see in these acquisitions. And I don't know if that's the ethos of the team, the tools of the moment or the people, but it's really cool.

0:29:57.6 Chris Aberger: Yeah, I think it's an obvious one plus one equals three situation. So I think it's awesome on both sides, but just to double click into what you're talking about on the precision side. So like, I kinda set up this unstructured versus structured data. There's this, you hit this very well and eloquently, so I should have said this, but like, there's a user expectation difference between these two kinds of worlds as well. So like on the unstructured side, you're typically starting from nothing or like a typical search experience, like getting a 91 F1 score is acceptable. If I go talk to a data analyst and I tell them you're going to get a 91 F1 score, they're like, what the hell did you just say? Right? Because there have been systems on top of these databases since Cobb in the 80s and like the answer is the answer, right? So that is the expectation on the structured side of the world. So there's this user expectation side where like you have to get it right on the structured data side. And that's why the bar is higher and why it's a harder problem to solve going along that line of kind of that precision agentic thing and why it's so important for the problem that we're looking at is also just kind of the user expectation and experience on this side of the house.

0:31:03.0 Satyen Sangani: Yeah, and it's funny because I talk to a lot of people who are like, look, man, this game's gonna be over. Everybody's gonna do this work and like in three to six months, it's gonna be the unity of like all problems being solved because somebody's gonna win this race. And I sort of, look, I obviously think that scale has its own beauty and a lot of these players have a lot of interesting things to contribute. I also think there's lots of hard problems and complicated problems to go solve. And I think at Columbia, there's a professor who I met, a guy named Eugene Wu, and he's basically starting what he terms as sort of an agent lab, which he's trying to compare to the AMP lab at Berkeley. And, you know, he talks about, the way he talks about this lab is he's like, look, in the early days of databases in the '80s, people were talking and you mentioned Cobb, but like there's all of these problems to get to true asset compliance. And it took years, decades to get to that, to the moment where databases could be truly relied upon. And he's like, we're going through the same stuff. There are same fundamental research problems in agents that need to be worked through or with agents that need to be worked through to get to a level of reliability. And I'm on the side of like, yes, the future will be really exciting, but it's gonna be jagged and it's gonna be unexpected. And I think what's fun for us as builders is the cool problems to solve. Like they're just great stuff to go build and figure out.

0:32:24.7 Chris Aberger: Yeah, I think there's like infinitely large number of cool problems to go solve. There's also the same amount of hype too. So I think having like realistic expectations in terms of kind of taking stepping stones in terms of how you're gonna get to fully automated agentic solutions. Again, I think what we saw even throughout the duration of Numbers Station was like buyers getting super burnt out on just like hype and demos that like anyone can rip up and like what's real here, what's not, but being really pragmatic in terms of how you can go into organizations and incrementally provide value kind of along this trajectory towards this end-level vision that we all believe in and realize right now I think is essential and crucial when working with companies.

0:33:11.3 Satyen Sangani: Yeah, I do feel like that too. And I know there's a lot of pressure from the world of customers and employees and investors and all these people who are like, let's get out and make these sensational claims and just say AI is gonna, I don't know, like what is it? Like there's like, oh, like 25% of all white-collar jobs are gonna go away in two years. Like, okay, I guess maybe, and like, who knows? Like the world is like the future's hard to predict, but I do feel like, I mean, the other thing that I loved about you guys is that you just stood for a very pragmatic and very authentic and very clear and purposeful brand of like, look right in front of us, there are problems to go solve and we're gonna go solve those problems and we're gonna go help customers right now and we're gonna make claims based upon what we can do today as opposed to these sensational things. And I do think that was maybe one of the hidden but really cool values that I think both companies share. And so that's one of the things that I really admired about you and Ez and Sen as well. As you forecast forward, what are you excited about? Where do you see the world going? What things do you, I mean, we've obviously talked about some of the problems about like now forecasting forward, what do you think is gonna be things that people are not paying attention to that they ought to be, things that you're excited about people, that people should be thinking about?

0:34:31.3 Chris Aberger: Yeah, so I think there's kind of two angles of things that I'm a technologist at heart, so I'll always focus on these technical problems that I think are really interesting, but I think they tie back to huge enterprise value and downstream value for customers. So one is this whole curation and feedback loop around the metadata itself. And there's a ton of innovation to be had in this space in terms of just automatically going in and curating and producing this information, as well as having these really tight feedback loops for a bunch of different agents on top to go in and kind of maintain this living organism over time. As I said, it's a problem that we focused on throughout the duration of Numbers Station, but I would not claim is a fully solved problem. There's a ton of technical innovation to still go on in this space. And I think the companies that really nail that loop are the ones that are gonna have the most success and the end level applications that they're doing on top. So I think there's a ton of technical innovation to be done and had on that side that I'm really interested about.

Empowering enterprise data users to build with AI

0:35:35.2 Chris Aberger: And then the second part is actually those applications on top. So, you know, I think we're just scratching the surface of what you can do with these agents on top. And one of the things here is having this web of interconnectivity across a bunch of different tools. And then the second thing is actually having organizations not be bottlenecked by the vendor. And so what I mean by that is like early on in Numbers Station, you're going, just working, building out these first versions of agents. It was a lot of us building it out, right? To go solve these end level use cases. What I've been excited about in the past six months is I'm seeing our customers start to come up to speed and build their own agents on top of our platform. And that's when you're really starting to cook with gas from my perspective, because the domain expert should be the one going to build these applications, not the vendor. So I think giving that power and kind of unlocking this power with organizations where you can go build your own agentic applications really quickly and having that platform that powers that sitting on top of the metadata and structured data underneath is something I'm super excited about and excited to see all the different types of things that customers will build up on top of our combined platform.

0:36:50.8 Satyen Sangani: Yeah, I mean, that's one of the things that has struck me about Numbers Station and about just generally how we're seeing AI use evolve, which is that the entrepreneurial, curious, high power, high agency, high motor individuals are people who now have sort of superpowers. I mean, they just have this ability to take that knowledge and do, you know, everybody talks about 10x, but the ability to do at least an order of magnitude more than what they otherwise would have done. And it lowers the barrier to being that person. It sort of allows you to, if you have an idea and you have initiative, you can go do all this stuff. It is a little, what I think is also the counterbalance to that is it's a little scary. Like if you're just kind of somebody who's like, I just want to show up and ask random questions and not really do a lot of work and not think really, really hard. I think those are probably the areas where I think you're going to be put under pressure because these tools will force you to the limit of what you can do. And, you know, recently it has been, I think the case in software actually really catered to the lowest common denominator. And interestingly, these tools are now creating to like almost the best in class, which is really fun. So you're based in Seattle. Tell us a little bit about you. Tell us a little bit about the culture of the team that you'd like to build and the cultures that you think are winning in the customers that you're seeing.

0:38:17.6 Chris Aberger: Yeah, so about myself, like a little bit about my background. So went to Stanford for my PhD, as I mentioned before. I actually originally was like super into the hardware side of the house. So it's like computer architecture as an undergrad. That's what I wanted to do for my PhD. So initially started working with Kooten, who's the father of multi-core. And I remember sitting outside. I basically like stalked Koonle to work with him. But I was like sitting outside of his office waiting for him. And I was like, I want to work with you. I'll do anything to work with you, basically. And he was like, great. But I don't want people working on hardware. You need to work on software. I was like, okay, I guess I'm working on software. So I started working on software throughout my PhD. Eventually met Chris Re through a class he was teaching. He came to Stanford as well and really focused on databases for my PhD. So I don't know why you would go look up my PhD thesis. But if you were to go look that up, it's on worst case optimal joint processing.

0:39:17.0 Chris Aberger: So really hardcore kind of database infrastructure. Pivoted more towards the AI side of the house after I'd finished my thesis. Eventually went to SambaNova Systems where they were building hardware for AI. So it's a great combination of some of the early things I was interested in. And then Numbers Station, of course, kind of combining databases with AI has been my trajectory. So in terms of kind of culture and building out companies, I think the number one thing that I look for in people is the ability to learn quickly. Right. So like it almost to a certain extent, it sounds weird because I actually hired a bunch of really credentialed people, both companies I've been at. PhDs, really smart people from top labs.

But I think one of the qualities of PhD students that's well suited for startups is you're a little bit fearless in terms of you're really you're used to these quick iterations. You're used to failing right when you're doing you're doing research and you're you're continually interested in learning. Right. You're not set in your ways in terms of how you're adapting to the market landscape, etc. So that kind of ability to learn. And as I talked about earlier, iterate fast is kind of the number one thing that I look for from a cultural perspective, as well as, of course, just being a good person to work with. No one wants to work with someone that's extremely difficult. And so just finding it could be an AI PhD. It could be someone straight out of undergrad. It could be someone later in their career. It doesn't really matter. But that kind of hunger to learn and adapt and fail and iterate fast is the number one thing that I've always looked for when I'm going out and hiring. And I think our culture hopefully embodies that to a large extent.

0:41:02.7 Satyen Sangani: Yeah, it absolutely does. I mean, even in the early days, you can just see that happening with the pace of code that you guys are able to put out. And I mean, candidly, it's funny. I mean, even revealing a little bit of our own security and insecurity and some of the things that we were thinking about. And we were talking about like, hey, these guys are going to come on board. And wow, are we going to be able to move as fast? And I think it's been both fun to actually have the team move as fast, but also really inspiring to watch how quickly you guys move. And I think it shows you that there's always another level to go reach.

0:41:36.6 Chris Aberger: Yeah, I think everyone's moving fast across all angles from what I've seen. And it's what's needed in this market, right? I mean, the space is moving so fast and we're going to get things wrong, right? It just matters that you get them wrong quickly. I think, now, they're just another quote that like popped in my head as we were talking about this. I remember early on at SambaNova and it's just like Chris Ray's been my mentor for quite a while. I was like struggling between a big decision that I had to make and like so concerned about getting it right. I was like, I have to get this right. It's like life or death. Turns out it didn't matter. But, you know, I was like coming to him and I was like so distraught about it. And he's like, dude, you spent a week thinking about this. The worst thing you can do is not make a decision. So just make a decision. Who cares if it's wrong? Like just go change it after the fact. It's not like to use Bezos's analogy, like a one way versus two way door. And that is also something that I've tried to embody. It's like it's fine to make wrong decisions. Just like admit it and quickly course correct, right? As you're kind of moving through this. And especially in this space, like the world is shifting every month. We're going to get things wrong. Just got to move fast.

0:42:41.9 Satyen Sangani: Yeah. And I think that's an interesting modality for customers because I think customers obviously want a little bit of a roadmap and they want predictability. And I think one of the things that it also means is that you got to with both team and customers be really transparent about like what have you figured out and what haven't you figured out and what's open and what's not. And really have to be pretty high of your expectations for the level of sort of attention that people are paying, but the maturity that they come at the problem space with because everybody's just running super fast. And that means you're going to run into dead ends as much as you're going to run into some success.

0:43:16.7 Chris Aberger: Yeah. I mean, I think you should isolate your customers from that process as much as possible, right? Like for the things that you've solved, that's what you go roll out to customers. And then the things you're iterating on hopefully is shielded, right? From the customers.

0:43:30.4 Satyen Sangani: I think that's right. Although I think that's right. But like, one VC early on told me, look, there's things in any endeavor that you've proven. There's the things you know you can get to. And then there's the dream. And I do think that everybody wants the dream. And I think you have to be pretty authentic about the first thing and then pretty clear about how you get to the second thing. That's fair. But I think there's a lot of people that are trying to do great work that just want to know. And so, it's pretty cool. So I hear that you're interested in moving from engineering to becoming a lawyer here at Alation. Tell us a little bit about that transition because I know you love the law discipline so much.

0:44:12.3 Chris Aberger: Actually, funny enough. So if you do want to know, I did apply to law school out of undergrad and did get in actually with like scholarships to some pretty good universities. But I Said, so Satya is making a joke about the acquisition process, which I think any CEO or founder who's been through an M&A knows it's a lot of fun times with legal teams. But actually tying into my background, that actually was originally why I did engineering. I thought I would be a patent lawyer. And I did an internship. I think it was like at IBM or Apple. I'd have to remember the particular summer. And I was like, whoa, this is way better than being a lawyer. I'm going to go be an engineer. I decided to stick to engineering. So I think I'll stick to engineering for now, but maybe we'll keep that option open for a later date.

0:44:56.3 Satyen Sangani: Yeah, it was. I mean, just for the inside baseball there, I think you got the lawyers put you in this place and not because that's what they're paid to do, but they put you in this place where you have to think about every worst case, like corner scenario. And then you're just like, you put yourself in these circles where you're like, God, this is like the most horrible thing that's going to ever happen, even though what you're doing is something really great. And I think both of us, it's probably not the space that we both want to operate in.

0:45:23.4 Chris Aberger: Yeah, it's fun times in the M&A process, but worried about what happens if we get hit by an asteroid tomorrow, which we all have bigger problems in that case anyways.

0:45:33.8 Satyen Sangani: For sure. Chris, welcome to Alation. Thank you for joining Data Radicals. It's been awesome to talk to you here and obviously to work with you. And so excited for what we're going to do. It's going to be amazing.

0:45:46.3 Chris Aberger: Yeah, I'm super excited about it as well. And thanks for having me on the show.

0:45:50.3 Satyen Sangani: That was a fantastic conversation with Chris. What stood out to me most is how clearly Chris grasps the real challenge in enterprise AI. It's not just building the agents, it's building the foundation those agents rely on. And that foundation is metadata. That's what makes AI work in the real world. And that's why this partnership makes so much sense. Alation has the gold mine of metadata, Numbers Station has the agentic apps and platform to act on it. As Chris put it, we're just scratching the surface of what teams can build when they're empowered to create their own AI agents on top of structured data with feedback loops that constantly get better. That's the vision. A faster, more intelligent, more actionable future for enterprise data. And we're building it together. I'm Satyen Sangani, CEO of Alation. Thanks for tuning in to Data Radicals. Stay curious, stay bold. See you next time.

0:46:45.2 Speaker 2: This podcast is brought to you by Alation. Your boss may be AI-ready, but is your data? Learn how to prepare your data for a range of AI use cases. This white paper will show you how to build an AI success strategy, and avoid common pitfalls. Visit alation.com/AI-ready. That's alation.com/AI-ready.

Other episodes you might like

Season 3 Episode 14

Delegate to Innovate: How Letting Go Makes You a Better Leader

From Coast Guard missions to AI at Kroger, Todd James shares how true data leadership means knowing when to step back—and when to dive in. Hear how he scaled retail ops with AI, why CDOs must evolve, and why digital transformation is really about people.

Watch now

Season 3 Episode 13

Data Products for Dummies

What powers AI agents? Curated, contextualized data. Former Gartner VP Sanjeev Mohan unpack data products, why they matter for AI, and how they’re reshaping the future of work. A must-listen for data leaders.

Watch now

Season 2 Episode 26

Vector Databases 101

Edo Liberty, CEO and founder of Pinecone, introduces the impact of vector databases on AI, likening them to Esperanto for algorithms—a universally understandable language that transforms intricate data into an easily interpretable format for AI systems. Unlike traditional databases' clunky, one-size-fits-all approach, they make AI smarter, faster, and infinitely more useful. As the fabric of AI's cognitive processes, vector databases are the hidden engine behind the Generative AI revolution.

Watch now