Turning Data Librarians Into Supercomputers

with Deb Seys, Senior Director of Learning & Communities, Alation

Deb Seys, Senior Director of Learning & Communities, Alation

Deb Seys

Senior Director of Learning & Communities, Alation

Deb Seys, Alation’s Sr. Dir., Learning & Communities, has a Master of Library and Information Science from the University of California, Berkeley. Previously, she led data and information management, discovery and search efforts at companies including eBay, Kaiser Permanente, and HP Labs.

Satyen Sangani, Co-founder & CEO of Alation

Satyen Sangani

Co-founder & CEO of Alation

As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”

From papyrus scrolls to the data catalog

Satyen Sangani: (00:00)
No one knows exactly how old the profession of librarian is. We do know that it's at least 2300 years old. At the Library of Alexandria in Egypt, the Greek scholar Callimachus developed the first library catalog to organize hundreds of thousands of papyrus scrolls. He called it the Pinakes. The Pinakes included a list of authors in alphabetical order, when they were written, summaries of the various works, where they came from, and more. Papyrus scrolls were grouped together by subject matter and stored in bins and resident librarians, including Callimachus, would help visiting scholars find the works they were looking for by using the Pinakes. Over the intervening centuries, things didn't really change much. A librarian in 1950 did much the same thing as a librarian in 300 BC. The number of books multiplied, but the basic task of consulting a catalog to find a work remained the same.

Satyen Sangani: (00:58)
And then the internet changed everything. The internet changed what libraries are and how they're used. Most people now look things up from home and read on their devices. If you're like me, you might even wonder if the whole idea of libraries and librarians is out of date. Maybe no longer necessary, a relic of the past. And yet we face a contradiction because with more information and more content than ever before, sifting through to find the right information has never been more difficult. On this episode of Data Radicals, we talk to a librarian.

Satyen Sangani: (01:34)
Deb Seys is the Senior Director of Learning and Communities at Alation. She has led cataloging and searchability efforts at eBay, Kaiser Permanente, Hewlett Packard, and more. And she is a trained librarian with a masters of library sciences from Cal Berkeley. Before meeting Deb, when I thought of librarians, I did not think of data at all. It's tough for me to imagine my high school librarian running a SQL query or talking about data warehousing or ETL. And yet, as Deb has taught me over so many years, helping people find the relevant information at the right time has never been more important — and especially so in the world of data, whether that's a report or a database. So if you are interested in data, Deb is going to help you see it in a whole other way.

Deb Seys: (02:26)
But I think one of the things that people don't realize is that librarianship has actually grounded the theoretical problem of how do you catalog the world's knowledge? How does the Library of Congress decide? In the sixties, all this literature about feminism ended up in home economics, because there just wasn't a Library of Congress number to put it in yet, ironically. And so that idea of evolving with knowledge and the idea of information as augmenting human intelligence and computing and the internet. What does it mean to remove the burden of the daily slog to find what you need in order to get your job done, to augment that in a way that gets people sort of past that early burden and off doing the next level of work?

Satyen Sangani: (03:24)
So let's dive in a little deeper and find out why librarians have never been more relevant. We in the data industry need to learn from them after this break.

Producer: (03:35)
Welcome to Data Radicals, a show about the people who use data to see things that nobody else can. This episode features an interview with Deborah Seys, program lead of the data intelligence project, Alation. On this episode, she and Satyen talk about creating data culture, misconceptions about her role and information science, how she thinks searchability will change over the next two decades and much more. Data Radicals is brought to you by the generous support of Alation, the data catalog and data governance platform that combines data intelligence with human brilliance. Learn more at alation.com. And now let's send it over to your host, Satyen Sangani.

Satyen Sangani: (04:16)
The jump from librarian to data science expert seems like a big one, but according to Deb, it's not. In fact when she describes it, the path seems logical, almost inevitable.

Deb Seys: (04:27)
I was going to be a catalog librarian. I was going to be a “systems behind the scenes” type librarian, where we cataloged the collection and managed how people were going to be able to find things and use the library catalog. And then very early on, I got introduced to early systems — library systems — and took a right turn from libraries into developing the systems that libraries use to manage their catalogs. And it was a short jump from there to other kinds of information management, employee portal, search engines, and all of that was taking off at that stage in my career.

Satyen Sangani: (05:14)
So I guess the core problem is, how do you find information? What made you passionate about that?

Deb Seys: (05:20)
It's funny. I was actually more interested in how you put it away. If you don't put it away right, you'll never find it. So I was really interested in that problem of describing and making things. At some point there was this notion of making things self-describing.

Satyen Sangani: (05:37)
I'm thinking of you as this kind of Marie Kondo for data. I'm still pretty proud of that one because Deb really is the Marie Kondo of data. And just what does someone do as the Marie Kondo of data? Deb walks us through one of her earliest projects at eBay to give us an example.

Deb Seys: (05:57)
Well, one of the first things I started doing at eBay when I joined and we were metrics in analytics exchange and then we moved to building a tool called Data Hub that I think you remember, for example, there was a requirement that if there was a change to a table, somebody had to write up a release note. So, some ETL engineer would write up some description of the added column to a table or they were deprecating something. And it was my responsibility. Somebody found out that I had a lit writing undergraduate degree and they told me, "Okay, you do the release notes." And so I’d have to spend time interviewing to figure out what the impact of that change was, why they'd done the change in the first place, how best to describe it in business language so that it wasn't just a technical change, but had a business impact to it.

Deb Seys: (06:56)
And that was the piece that I'm sort of talking about, that there was that gap between the work that these folks were doing to kind of carefully build and deliver this data warehouse and the understanding of the people down the road who had to use it. And the analysts filled that gap in their own understanding of the data and so they could adapt to those changes. But if you wanted someone who wasn't sort of intimately in lockstep with technical folks, then you'd be lost. And so in order to be able to open up the data to a broader population, you had to somehow figure out how to describe that business understanding and the down-the-road real impact, meaningful impact of whatever that change was.

Deb Seys: (07:50)
And so it started with just change and then became sort of a bigger thing around usage and activity and value and so forth. And Alation kind of brought that bigger picture because when we made it public on the data catalog and you didn't have to have some insider's understanding of what IP address you needed to use to log into some particular part of the data warehouse, once it became more easily available that way, you had to make the understanding more easily available as well.


What is a database query?

Satyen Sangani: (08:27)
Let's back up in case you don't know what Deb is talking about. I know this is the Data Radicals podcast, but just because you're a data radical, it doesn't mean you're a data expert. We had Deb explain to us some of the most common and basic terms in working with data. People listening to this podcast here don't even know what a database what a database query is. I actually (even maybe at the time of starting the company or maybe not very long before) had no idea what a data — so, what's a query?

Deb Seys: (08:54)
It's a question you ask of the data. So you have terabytes, petabytes of data that you've gathered tables with hundreds of thousands of rows of data gathered from activity on your website, for example, and you ask a question of the data and get an answer back in the form of…. So, some people are quite logistically experienced in asking very complicated questions and their queries are programmatic and difficult to read. They mention data assets, they go through some complicated programmatic logic to filter out certain things that don't matter, zero in on things that do and get a result set to try and begin to answer a question that they need in order to make a decision, let's say.

Satyen Sangani: (09:53)
And just to ground this, like what's an example of a query in that context?

Deb Seys: (09:59)
Well, at eBay, it might be something like an experimental; eBay did a lot of AB testing, for example, that was a huge rich source of data. And so A/B testing means that it wasn't always this clear-cut. It was quite complicated, but let's say half of the customers would see a blue call-to-action button in the upper right-hand corner of a page and half the customers would see a yellow button down in the lower left-hand corner of the page and the product manager who was designing that page wanted to know which one people responded to more often. And they wanted to know which one people responded to more often when the page they were looking at was “kitchen product” or a sporting goods product, maybe sporting goods, yellow button got a better result; kitchen products, blue button, upper right-hand corner, or maybe the demographics. And the tables and data eBay was massive, was old, was moving all the time.

Deb Seys: (11:09)
And so writing a query that might look at a table with millions of rows and a thousand columns might require that you filter out just this year's data, just the demographics of a user between 20 and 35, just the products that appear on what we call sports pages and just the button in the upper right- or left-hand corner. And so you can imagine that kind of logic to pull just that set of data, and then to tell them they clicked on the button and therefore the product manager might make some decision to actually go to production with the blue button in the upper right-hand corner for sporting goods.

Satyen Sangani: (11:55)
Got it. So this person is trying to answer a question about some behaviors on the website that they're managing and all of a sudden there's a lot of complexity around all of that human behavior because people who like kitchens might want green buttons and people who like sporting goods might like blue buttons and you need to figure out the differences and maybe that's even dimension by time. Okay. So in that world, how do people figure out like ... so you said there's this idea of a Wiki page and how would that be different from this idea of a query?

Deb Seys: (12:28)
That's the really fascinating part, that actually if you read the logic of a query tells you a great deal, it tells you, first of all, which data assets hold the information that you're looking for. And that's not always obvious. So for example, at large companies often, the same bit of data is repeated across different tables over different times. Sometimes tables themselves, or data assets, are deprecated or in other words, retired or replaced or enhanced. And so it can be very complicated to figure out where the answer to your question might be. And if you read a successful query, it can reveal a lot about the data landscape.

Satyen Sangani: (13:19)
If you're at all familiar with Alation, the company I work for, much of this might sound familiar to you. And that's because we imagine one facet of what we do as a giant software version of Deb. And that's something she recognized when we met for the first time. Yeah. It's funny. When I think about our first meeting, do you remember our first meeting?

Deb Seys: (13:38)
I do. I do. I remember what the conference room looked like.

Satyen Sangani: (13:41)
Yeah, me too. Do you remember what you said when you first saw the demo of Alation?

Deb Seys: (13:49)
But I remember thinking, I don't know, maybe I said it out loud, but I remember thinking this is going to blow everything out of the water.

Satyen Sangani: (13:55)
Yeah. Well, I think you said something like, "I see my career dissipation light blinking because this thing was going to replace me." And I started laughing because I thought to myself, well, if we're building this product, this thing, like what I want it to be is the software personification of Deb. I always thought about this idea, of sort of this helpful research librarian who would, if you had a question, would be able to figure out what ideas or bits of information might be able to help you. And what's interesting is that somewhat similarly, you're sort of saying like these people, these engineers, are building all of this information, right? They're building all of this data, they're building these data model,s and these analysts have a question, but somebody needs to marry these two people together. How do you know what the analysts are looking for and how do you know to categorize it in a way that they can consume it?

Deb Seys: (14:47)
Right. So if you think about librarianship as a profession, and one of the ways I try to explain it is that the same book — let's say you get a book on medical ethics — if you were to put that book into a medical library, it would end up in the ethics section. But if you were to put that book in a general library, it would end up in the medicine section. And so you have to kind of think of the whole and the piece at the same time. And the reason that piece ends up in the whole in two different places is the purpose of the collection and the people who are going to use it.

Deb Seys: (15:29)
And so that's why when you think about Alation, as it's installed and implemented at different companies and their different culture and their different sense of who they are and their meaning and their purpose and their business outcome, they make quite different decisions about how to implement different features and that's why. Because one's a financial company, one's an e-commerce that has a product that consumers are using, or maybe it's just insiders inside the enterprise. And so the decisions about how to manage the collection or how to answer a question are very contextual to audience, the content you have, and the purpose you're trying to — there isn't any objective way to catalog anything really.

Satyen Sangani: (16:19)
One thing Deb always tries to keep in mind is the end user. Data Radicals can get so excited about everything that can be done with data that we start to appreciate data for its own sake. But according to Deb, that can be dangerous. You always have to remember who has the problem, what you're trying to solve for, and why.

Deb Seys: (16:39)
They're not trying to discover the information for just the sake of discovering it. They need the information to get something done. And I think that's the piece we always forget. And I had a mentor at HP years ago. And at that time there was, again, this sort of knowledge management holy grail. And it was like, let's get the right information to the right person at the right time! And her answer was, “So what? Then what do they do? You've given them the right information. They're actually not just sitting there looking for the information, they're actually trying to get some business work done.”


Bad metrics … or wrong metrics?

Satyen Sangani: (17:13)
When you're doing this work, learning from the failures can be as important as learning from the successes. And so that's a success. What about the failures?

Deb Seys: (17:23)
Never had any, Satyen. I didn't have any failures at all that I can remember. I'm sure I had plenty, but for some reason the one that keeps popping in my head right now that I think is interesting to talk about, is: We bring a lot of preconceived notions to reporting and data and the production of metrics. And one of the things about democratizing access to data is that you get a lot more opinions about it. And if you also democratize how you catalog assets, you could potentially not be misled by them. And one example I can think of that I think is just a great example, has to do with the introduction of a better way of managing thumbnails on eBay product pages. So when a seller wanted to post something to an eBay listing, they were given the opportunity to purchase a certain number of images that they could upload with that listing to make it more attractive.

Deb Seys: (18:42)
At some point, eBay decided to make that free for a certain number of images because they realized how important the images were. And there was a whole effort to make the display of the images better: smaller, sharper, more easily browsable. And there was a significant metric prior to that effort that said that people would spend a great deal of time browsing a results page. So I'd search for “hand exercisers” and I'd see this list and the metric wanted people to spend a lot of time on that page. And so a high number in that metric was a success. But what happened when they improved the images was that suddenly people weren't spending a lot of time on that page at all. And some key metric was going down and there was an uproar about “The release of these new images was causing this metric to go down and it was awful and they have to pull back on it. We don't know what's going on!”

Deb Seys: (19:44)
But actually, when they looked further, what they found was that people were actually clicking through because they were finding what they were looking for really quickly. And that actually, what they needed to look at was the conversion, the shopping cart conversion metric which they hadn't connected to the “view the results page” metric. And so this complexity of the, sort of an old story, but the complexity of the connection of these metrics showed that actually less time on that page was good. And the connection of viewing that thumbnail, getting to the shopping cart, and converting was actually the metric that they wanted to be looking at.

Satyen Sangani: (20:22)
Yeah. Which funnily enough, seems like another example of kind of rigidity, right? Because there might be people whose jobs are dependent on driving engagement. And so this was catastrophic for them because people were not engaged as measured by time spent on page. And all of a sudden you're sitting here or — not you, but like somebody improved the user's experience. And the outcome got better. But the intermediate metric didn't.

Deb Seys: (20:49)
Right. And the understanding of the user experience needed to evolve.

Satyen Sangani: (20:53)
Right. And so the inside moved somebody's cheese, as it were, like before. And so, on some level it sounds like the hardest problem there was just that you have to change people's mindsets and also kind of evolve in some way, like the actual exploration of data, in many cases, changes what people do and what they're focused on.


Do people really tell stories with data?

Deb Seys: (21:14)
I think one of the most misunderstood things about data is that data is actually an accumulation of meaning and that people actually tell stories with data. And the idea of that data might be this sort of objective hard thing that just sits there and reveals something as not true at all. Actually, a lot of the work of documenting metrics or documenting why we're capturing data into a table is a lot about what's important to a group of people, what's important to them. How do they define it? How do they agree on that definition? And then use that to actually tell a story or make a decision.

Satyen Sangani: (22:06)
One of the things that Deb really emphasized is that it's not enough to find accurate answers. Data Radicals need to teach people how to ask the right questions. She told us how she goes about doing just that. And how do you think about approaching those problems when you see them?

Deb Seys: (22:22)
When I was an undergraduate, I graduated with a lit writing degree and I thought, before I decided to go down the library route, that I was going to teach writing. And one of the thoughts, the approach that we used at the time was not to sit down and begin writing. Particularly when you're trying to create an argument in an essay or something along those lines, rhetoric, if you don't exactly know what it is you're trying to say, you'll never write well. And that once you've decided what it is you actually have to say — the point you're trying to make — then writing actually comes quite naturally to most people. The difficulty in writing is not the difficulty of writing, it's the difficulty of knowing what it is, the heck you're trying to say or what your opinion is. And I would say that same thing happens with data.

Deb Seys: (23:23)
If you don't have a really good sense of what it is you're trying to find out. I mean, it kind of boils down to that measure twice, cut once. The thought process is the measuring twice. The other thing I would say is that you kind of have to know — and this recalls for me, an experience I had with a user experience researcher at eBay — you have to know the detail or the clarity or the granularity that you're seeking or that you're driving toward. So I was thinking about, like, you might tell somebody, “I went on a date with this guy. He's got blue eyes.” And my friend immediately would know, “Oh, blue eyes. I have a sense of what that is.” But they're not going to ask me the hex color of the representation of his blue eyes. It was enough to just know he has blue eyes, but if I were painting the wall in my kitchen and I wanted it to be blue, I might actually know the paint color number so that I would match exactly the blue in my living room with the blue I wanted to paint my kitchen.

Deb Seys: (24:36)
And so the detail or the specificity of what I'm looking for is really important in the context of what I'm trying to get done. Communicate that I met somebody with blue eyes or actually paint my wall a very, very specific color. And the example I have for this with data is this user experience researcher was trying to gather information to give one of her designers about some general design or interaction on the site. And I kept telling her, "You're going to have to wait until Wednesday's batch runs to have the most up-to-date information or something like that. The table you're trying to use is not fully accurate or fully built out."

Deb Seys: (25:27)
And she was like, "I don't care." She says, "I just want to muck around a little bit and get some ideas. So it doesn't really matter to me that this table isn't fully loaded." And she said, "Don't worry about it." And actually I said, "Well, in that case, you could use this one. It's a month old, but it's a lot easier to use and it won't take you four hours to query."

Satyen Sangani: (25:49)
Yeah. It's funny. In some ways I find that between the first example of really having an intention or an idea or a motivation on some level; contrast with that notion of play and discovery. I mean, I often find that when I think I know something and I think I know it really well. And then I sit down to write about it and then I've got to pick exactly the words and exactly the arguments and I've got to frame it in a particular way. And that ends up being its own discovery process where I learn almost, but based on, just tearing apart my own thinking.


How to think about data and analytics

Satyen Sangani: (26:23)
You know what? To think a lot of analytics is like that. But almost accepting that process I find is there's a little bit of sanity that can come to that because otherwise, if you think you know the answer going in all the time, that can be hard. According to Deb, approaching data the right way is more than modeling your database or engineering the right queries. It all starts with the right mindset, balancing your curiosity and ego with reason. It's checking your impulses and instincts against the observable facts.

Deb Seys: (26:55)
I've had a really fun opportunity to work with a very open-minded and wonderful professor over at the University of Wisconsin, Milwaukee. And it's the School of Information Studies, which is the kind of school that I graduated with my masters in library science a million years ago. And these are folks who are going out into the world to become a variety of professions associated with information management. And these days that includes data, includes things like cryptocurrency and information security, and as well as systems like what I started out, managing a library system, and many people just going into traditional library profession. And what we've been doing is providing them a platform to learn what they already need to learn, but to do it in a different environment, for example. And when I went to school for my degree, we were using things like LexisNexis, which is an online reference source, using an example of a library system or the large Library of Congress system.

Deb Seys: (28:14)
And so, it's not unusual when you're learning a profession like that to have an application to practice your profession on. And so by providing them Alation to use, we're hoping that they see the opportunity for the kind of work that they might do already in a slightly different environment around data. So one of the things that people who work in information studies do is they create a user experience and we're maybe getting them to think about: How do they create a data experience? And I also have an ulterior motive, which is: I happen to love my profession. I think many people have very fond experiences with libraries and with this idea of the world's knowledge and literature and librarians, do many of them live at the front lines of censorship and information literacy and freedom of speech, but I think one of the things that people don't realize is that librarianship is actually grounded in many foundational thinkers about the theoretical problem of how do you catalog the world's knowledge.

Deb Seys: (29:29)
How does the Library of Congress decide? And I mentioned earlier, how does the Library of Congress decide? In the sixties, all this literature about feminism ended up in home economics, because there just wasn't a Library of Congress number to put it in yet, ironically. And so as the world changes, the taxonomy had to … it doesn't live there anymore, but you might find some really early literature. So that idea of evolving with knowledge and the idea of information as augmenting human intelligence and computing and the internet, what does it mean to remove the burden of the daily slog to find what you need in order to get your job done, to augment that in a way that gets people sort of past that early burden and off doing the next level of work. There are a lot of early thinkers about that, folks who were involved in the early days of designing hyperlinks and the mouse for computers and stuff. And so I think my profession has a lot to offer the data profession.

Satyen Sangani: (30:46)
Yeah. I mean this idea that there is a virtuous feedback loop potentially between the consumption of information and the description of information seems to be the core theme.

Deb Seys: (30:58)
And to think ethically about not introducing bias, to think ethically about responding and evolving and recognizing as you mentioned before, not to be rigid, but that it's a living breathing thing in the same way that knowledge is.

Satyen Sangani: (31:17)
Right. Well, I mean that bias question is almost its own entire episode. And so, we'll have to come back to that at some point, but Deb, it's been a huge pleasure. Thank you for taking the time.

Deb Seys: (31:27)
Thank you for spending it with me, Satyen. It was a lot of fun.

Satyen Sangani: (31:31)
Deb is right. At the end of the day, data is really just about knowledge. Getting as much knowledge as possible and understanding it in a structured way. Librarians can help us do that. And really if you're a data radical, no matter what domain you're in, that's what we're all here to do. We've come a long way since the Library of Alexandria, but we're also drowning in more information than ever.

This is Satyen Sangani, co-founder and CEO of Alation. Thanks for listening.

Other Episodes You Might Like :

Start with Story, End with Data

Ashish Thusoo

Ashish Thusoo

Founder of Qubole and Creator of Apache Hive

Subscribe to the Data Radicals

Get the latest episodes delivered right to your inbox.

Marketing by