Jason McVay is a data scientist at Indigo Ag, an agriculture-tech company headquartered in Massachusetts. He has an education in environmental science and geography, with a Master’s degree in paleoecology. In this essay, Jason reflects on the value of thinking spatially about data, showing how his experience as a graduate student influences his role as a data scientist today.
The popularity of location data and GIS-styled analyses has amplified a common cry in GIS- turned-data-science circles: “Spatial isn’t special!” Many who make this statement are reformed GIS scientists themselves. And while I have come around to agreeing with this, I want to emphasize the importance of not missing the forest for the trees. With the explosion of location data in the cell phone era, nearly all data now has a spatial component, and thus many data analyses are indeed spatial data analyses.
I think this spatial isn’t special cry has good intentions, those of us trained in GIS shouldn’t be gatekeepers, rather we should be gate openers. After all, it’s not the data that is special, it’s how we think about data and the context in which we can place it that is worth drawing attention. Spatial data isn’t special but thinking spatially – now that’s a skill everyone should learn.
The rise of spatial data
As spatial data has proliferated, non-GIS professionals suddenly found themselves performing spatial analysis. This ruffled some feathers of some of the old guard trained in GIS. They said, “We can’t have just anyone working with spatial data; spatial data is special!”
A new guard responded by saying spatial isn’t special. It’s just data — they argued — and any data person, with a few basics under their belt, should be empowered to work with it.
Indeed, there are some important spatial concepts one should know about. But in my opinion, this isn’t really what is important. Spatial data has never been “special” — what’s essential is the way we think about data, given that everything has a location. Spatial data skills are ones anyone can and should learn to be an effective 21st century data scientist. It’s never been about the data type — but how we think about it.
Spatial isn’t special. It is essential.
Analyzing data is about identifying patterns. And while I am a data scientist, that’s only half the story. I think of myself more as a data explorer; the scientific method has trained my eye to hunt for context and perceive unique patterns. A data scientist rarely investigates individual datasets in isolation; rather, my duty is to place that data into an appropriate setting, from which a conclusion may be drawn.
I don’t always know ahead of time what I’m going to find with any given set of data, but experience allows me to form a hypothesis. I have learned that the specific type of data has less bearing on a project’s success than something else: how does this data fit in context?
How to get from point A to point B is not the only purpose of a map. Some maps are created to render the known world, to provide context. An early step in any project for me is to visualize a given dataset in space. From there, patterns emerge. This is because placing data in its larger physical and spatial setting is an exploratory best practice. By viewing data spatially, inferences can be made, and the imagination can be sparked. Spatial isn’t special. But in a world where so much data has a location, it’s essential to think spatially.
From an ancient lake to a data lake: A paleo perspective
I’ve been getting my hands dirty with data for a long time now. I studied paleoecology in graduate school. To more clearly view the past, I turned to the mud. My studies took me to Laguna Limón, a coastal lake in the Dominican Republic where a sediment column collected from below the lake bottom became the focus of my work. Back in the lab I identified charcoal deposits, noted the change in density of organic matter, and radiocarbon-dated key samples.
With these data threads cataloged and visualized, I was able to weave a story of how this environment has changed over the preceding 6,000 years. In isolation, this data was only so informative, but by coupling this direct data with other regional studies and larger climatic trends, I was empowered to illustrate what I feel are important conclusions. Namely, everything is connected. And everything changes. What is now a picturesque coastal lake was once an open estuary. And in the future? The context is certain to change again.
If I were to summarize the most important thing I learned while studying paleoecology in graduate school, it would be this: All the data that has ever been collected is of the past. Whether that is 6,000 years ago or six seconds ago is a matter of scale. Understanding how systems have behaved in the past can inform how they might behave or change in the future. And context matters!
At the time, as I sweat beneath a Dominican sun collecting sediment samples, I had no idea how relevant my thesis methodology would become to my professional career.
Spatial context remains crucial
These days I still scour lake bottoms to uncover patterns in data, albeit a different kind of lake: a data lake. My tools are slightly different, too. Instead of a sediment corer and microscope, I write python and SQL to sieve through the milieu. Despite these surface-level changes, the challenge remains the same: Piece together parts of a larger story in the hopes of understanding how systems are interconnected.
For example, how can severe weather affect — not just the yield of a crop at harvest — but the trophic tug on a given supply chain? In other words: How does the context of extreme weather data change to answer the same question from a climate perspective?
Putting these questions into their physical and spatial context is crucial. The connections and implications are far reaching. It is my firm opinion that this is true of all data. With a grasp on the where and when, the next logical questions become, how, and why? Spatial data may not be inherently special, but taking a spatial perspective provides critical context for any data exploration.
“May your trails be crooked, winding, lonesome, dangerous, leading to the most amazing view”
– Ed Abbey