By Aaron Kalb
Published on August 29, 2023
I was chatting recently with a somewhat disgruntled data scientist at Facebook. He was describing how they’ve taken A/B/C/D-testing way way way past Y/Z-, with tens of thousands of simultaneous and overlapping experiments running, and constant small tweaks based on the resulting metrics, some implemented entirely automatically1!
As he spoke, a picture began developing in my mind of an enormous amoeba — a blob reflexively retreating from extreme heat or acidity, aimlessly approaching and absorbing nutrients — growing and growing but with no long-term plan and no higher purpose.
To explain some of the dire consequences of the company’s decisions (including all manner of horrors from teen suicides to genocide), tech journalists have sometimes ascribed an array of personality disorders and malicious intent to the leaders of the Meta behemoth. But hearing my friend speak, it occurred to me that Meta’s management might simply be trying to fully commit to what sounds like a laudable goal: being “data-driven.”
Can the road to hell be paved with data? Or worse, could those awful outcomes be the place a data-centric culture invariably leads?
(Spoiler alert: oh, hell no!)
Contemporary discourse draws a false dichotomy between organizations that are “data-driven” and those that are not.
Data advocates smugly (though not necessarily wrongly) point to countless disastrous decisions resulting from leaders ignoring increasingly obvious evidence and instead following tradition, instinct, or ideology down into doom.
But data skeptics — seeing morally if not financially dubious decisions made by some of the most “data-driven” organizations in the world — justifiably doubt that data alone can answer every question.
Neither side is completely right or wrong. So we shouldn’t just pick one. But we can also do better than offering some weak, vague platitude like “it’s a delicate balance” or “it’s an art not a science.”
A more nuanced and rigorous examination of decision-making processes—and the potential roles of data within them—reveals precisely where and how data should play a role:
We must (a) decompose the broad idea of “decision-making” into a taxonomy, and (b) question —at each level—whether data should “drive” the decision or influence it in another manner.
Doing so, it becomes clear that the issues data-culture critics raise do not stem from data’s use in day-to-day decisions per se; rather, those problems result from corporate leaders’ abdication of their human responsibilities to articulate a vision and establish values for their organizations. And it becomes equally obvious that the data proponents’ frustrations are not the inevitable result of the mere presence of human instinct and emotion anywhere in an organization, but rather their influence on practical and tactical decision-making.
Organizations naturally vary, but we can safely assert two absolutes:
No organization should be totally data-driven.
Every organization should be at least somewhat data-informed.
And the relative aptitudes of data and humanity are not just compatible but in fact perfectly complementary:
Leaders answer the question “why are we here?” through vision and values.
Data informs the subsequent decisions around “where to go” / “what to do” next.
“Driven” has many definitions and connotations. But today, the most natural interpretation of the word “drive” is likely the act of operating a car to get somewhere.
Let’s consider the role of data in that kind of driving.
When you drive a car, as a human, data plays an important role. Looking for examples of “data” while driving, the most obvious feed is probably the dashboard, which tells you how much fuel/battery power remains, and how fast you’re going.
Many modern drivers also see the data presented by their GPS device or software (e.g. Google or Apple Maps telling them how far until the next turn).
And the windshield lets a driver read street signs, speed limits, and the locations and trajectories of other vehicles. While those inputs might not feel like “data” to us humans, the representations thereof that would guide the AI driver of an “autonomous vehicle” are canonical instances of data.
So driving without any data (no dashboard, no GPS, and no windshield) would not only be very inefficient but downright dangerous. Even if you didn’t crash immediately, you’d never get where you wanted to go.
However, I don’t think anyone would say that the data is actually doing the driving. The car is not data-driven; it is (obviously if somewhat tautologically) driver-driven.
Let’s consider the key thing the data cannot do, even in a “self-driving” car. Data and algorithms cannot decide where to go, however brilliantly they might be able to figure out how to get any specified destination.
And why can’t an algorithm—no matter how sophisticated, no matter how large the dataset on which it is trained—decide where to go? Because it has no reason to go anywhere.
The most fundamental differentiated capability of the human mind in the context of driving is purpose. You, uniquely, can figure out where you want to go because you know why you’re trying to travel in the first place.
Every move you make requires making a series of decisions, in a particular order. To take an example, starting at the end, you might:
3. turn right at the next stoplight
2. in order to get to the store at the intersection of 1st & Main
1. in order to buy a card & flowers for your friends hosting dinner later (in order to show your appreciation and affection)
In chronological order, the steps proceed as follows:
Objective → Destination → Navigation
And each step is effectively an answer to a series of questions:
Why → Where → How
As we move from left to right, from high-level to specific, data becomes increasingly relevant, useful, and important:
Once a destination is set, all navigation and transportation itself can be fully delegated to the data: an autonomous vehicle can figure out pathfinding, steering, acceleration, etc. to get you where you want to go safely and efficiently.
And destination selection often works best when it’s data-informed: perhaps you use Google Maps or Yelp to find a convenient and well-priced gas station or a tasty Chinese restaurant. But most humans wouldn’t feel comfortable completely delegating all destination-setting to an algorithm2.
But the data can’t tell you that you’re hungry or what you want to eat3, or that you want to relax in a park or laugh at a comedy show. Objective setting is solely your responsibility as a human. And in fact, it is often not the product of your human mind deciding, but rather of your “heart” yearning or your “gut” grumbling.
In short:
Humans know what they want and why
Humans can only optimally decide where to go with help from data
And data can determine how to get there, perhaps even without any further human input
A sensible business plan or corporate strategy must answer the same three questions in the same order: Why → Where4 → How
For example:
Why? In order to grow our revenues 25% YoY for the next five years…
Where? …we need to develop a new customer base across East Asia…
How? …starting with flashy openings of flagship stores in Tokyo, Seoul, and Beijing, then proceeding to… etc. etc. etc.
It is hopefully self-evident that the specific steps for executing this grand plan had best be very data-informed, if not completely data-determined. Just because some Executive Vice President “has a good feeling about Shanghai” or “had great success in Osaka at a previous company” doesn’t mean either of those cities are the right sites for the next retail location. That decision should be based on the size5 of the target population in those places, the competitive landscape, supply chain considerations, weather patterns, etc.
But what about when we zoom out from that tactical level? Should the organization…
…expand in Latin America rather than East Asia?
…or sell new products within the current geographic footprint rather than attempt global expansion?
…or sell through foreign channel partners instead of directly?
As above, this decision likely shouldn’t be left entirely up to human whims. But unlike the tactical specifics, it’s hard to imagine the optimal outcome emerging from a pool of wonky analysts, let alone an algorithm. For instance, a model would likely identify a partner-centric approach as being lower-cost and lower-risk, but might have a hard-time quantifying the long-term intangible benefits of direct customer relationships or brand-recognition worldwide.
Some mixture of facts & figures with human experience, intuition, and abstract thinking will likely yield the smartest strategy.
Data might help an executive team predict the levels of employee burnout and shareholder confidence across different growth targets. But no data can help them to make those tradeoffs. A regression model can maximize a given metric, but can’t decide which metric to maximize.
If some proposal would increase engagement across the user-base but reduce quality of life across the globe, it is morality—not rationality—which must tell us not to pull the trigger.
And that is not just a hypothetical decision, as we’ve seen, for instance, with the recent disturbing revelations about the material and emotional damage Facebook has wrought.
While most processes can be optimized with information, not every decision can be entirely delegated to data. Different levels of organizational decisions should involve data to different degrees. Human leaders alone can set a vision and values. Data should inform — but not solely dictate — the selection of the strategy. But once an organization knows where it wants to go and why, the turn-by-turn journey there will be far more efficient if it’s “data-driven”:
Curious to learn how technology can support data-driven decision-making? Join us for a demo to see why a data catalog is critical.
1 .That part was actually not the source of his particular frustrations with his employer. I think he viewed it as largely good and normal.
2. While some mistrust of machines might be misplaced (induced by dystopian science fiction AIs like HAL 9000 and Skynet), a lot is totally rational, given the incomplete metadata available to the computer. For instance, you might want to drive an extra two minutes to an establishment with a slightly lower star-rating (apparently defying all logic from an automated optimizer’s perspective!) because you’re craving a particular dish they serve or you want to flirt with a bartender you know has a shift there.
3. Although perhaps it could predict that with some accuracy based on your historical behavior
4. As in “where do we want to take the business?” This could also be framed/phrased as “what do we want to achieve?”
5. And budget, buyer propensity, etc.