Dawn in the Anthropocene, an Allegory on The Evolution of Data
The quantum speed with which data is insinuating itself into every facet of life is challenging to comprehend. But doing so is imperative to our future, which is why we must be crystal clear about the past.
By Brett A. Hurt
Not long ago, I had the chance to share a beer and talk a bit of shop with my colleague at data.world, our Principal Solutions Architect Dean Allemang. The task we set for ourselves was to imagine one major, global problem for which data is not at the core of any solution. Spoiler alert: It’s an impossible task.
From climate change to political populism, from COVID-19 and the multiplying vectors of more zoonotic disease, to food production as we loom toward 9 billion people on planet Earth, data is the common denominator of any plausible course of action.
It didn’t take much prompting of Dean, a polymath whose book The Semantic Web for the Working Ontologist may the most important ever written on data, to offer up some evidence for the case that data is our destiny:
- Climate change is a daunting topic. From large datasets, we know that we are losing the equivalent of 40 football fields of tree cover each minute, as we scramble to improve our wholly inaccurate “Big Data” on total carbon emissions, the inadequacy of which now threatens the global response. Small data is important as well. In Colombia, the government has been making all climate data available and usable to small rice farmers since 2014, enabling them to change planting, fertilization, and harvest times that have increased yields amid steadily hotter average temperatures. “Anecdotes about how early you could go sledding as a child vs. today are not going to help us understand what actions we can take,” Dean said. “We need real data for that.”
- Data has never played a more important role in public policy than it has in the COVID-19 crisis. The modeling by Johns Hopkins University of the pandemic, which we support with our open data community, is one dramatic example of data’s utility. Another is the development of the mRNA vaccines that are nothing less than a miracle — themselves a massive, global data project. So far, that mobilization has yielded more than 130 mRNA candidate vaccines into the pipeline beyond the two which we are all now familiar, those of Moderna and Pfizer.
- On the steady rise of toxic politics, Dean argued that if we’d been paying attention to the data in the second place showing of the late Jörg Haider in Austria’s 1999 parliamentary election, or the success just three years later in the Netherlands of lawmaker Geert Wilders, we would have been better prepared for the likes of Donald Trump, Boris Johnson, Jair Bolsonaro or so many others. We must, Dean told me, do a better job of tracking the canaries in the world’s political coal mines. And it’s all about data.
- How do we feed more and more hungry people with less and less arable land? Dean quickly mentioned a number of projects with which he’s familiar: South African fishers using weather, catch and cost data are improving their efficiency and incomes; New Hampshire farmers are using cheap sensors to manage soil moisture; UK farmers use satellite data to minimize pesticide use; and small farmers in Kenya are using smartphones as tools to manage planting data and boost incomes. Those are just a few.
My goal here is not to recite just facts and figures on data’s utility. But rather to illustrate the stakes as we deal with a reality — the rise of a data-based economy, culture and civilization — that is only just emerging. Consider a calculation made a few years ago by Eric Schmidt, the former CEO of Google. He estimated that from the dawn of civilization until 2003, the sum of all data ever created was five exabytes. When you consider that the operating system on your cell phone stores between five and 10 gigabytes of data, and that an exabyte is 1 billion gigabytes, then five exabytes sounds like a huge amount of data. But today, humanity produces that much data every 48 hours.
The quantum speed with which data is insinuating itself into every facet of life is challenging to comprehend. But doing so is imperative to our future, which is why we must be crystal clear about the past. I’ll make this brief, but let’s start at the beginning.
From Clay Tablets to Double-Entry Bookkeeping to the ‘Cloud’
The first known commercial data, records of animal sales and grain harvests, were recorded as cuneiform by the Sumerians in ancient Mesopotamia on clay tablets around 3000 BCE. Fast forward 4,500 years and data became a true utility in the late 15th century with the invention in Florence of double-entry bookkeeping. Recording, storing, and managing data in this new fashion provided the clarity that allowed commerce to flourish and ushered in the Renaissance. The Battle of Waterloo in 1815 revealed an early example of what we today call “prescriptive data.” Word of Napoleon’s defeat was rushed back to London by secret carrier pigeon, allowing the House of Rothschild to use the exclusive, bird-borne data to make a killing on British government bonds.
Then, just 86 years ago, after the Social Security Act became law, Franklin D. Roosevelt’s administration created America’s first major data project to track the contribution of nearly three-million employers and 26 million Americans. The massive bookkeeping project to track and store the data with punch card reading machines was awarded to a young company called the International Business Machines Corporation, or “IBM.”
A quick spin through some more familiar and recent milestones includes: ARPANET in 1958; the COBOL data language two years later; the launch of Structured Query Language, or SQL, in 1979; the World Wide Web a decade after that; the emergence of cloud computing in 2006… And now, according to author Kevin Kelly, founding editor of Wired, data is the cellular structure of an emerging “planetary superorganism.”
To illustrate Kelly’s point, bear with me on a few more numbers. The world’s first blog was posted in 1994. Today, more than 7 million are posted daily. Along with those blogs, Google will log 5.6 billion searches today. At the beginning of this year, more than 46 billion devices were connected to the internet of things, the so-called “IoT” of sensors, software and other technologies that link up the beginning of Kelly’s planetary superorganism. And remember, we’re only in the early days of the soon-to-be-ubiquitous 5G cellular networks that will boost data rates by as much as 50 times. Meanwhile, more than 50 percent of corporate data is now stored in the “cloud,” the suite of storage technologies just two decades old. It’s better thought of really an “inter-cloud,” with a capacity of 470 exabytes linking the server farms of Amazon, Microsoft, and others.
The tendency of technology to evolve in ways that mimic natural phenomenon, biomimicry, is a helpful guide here. For in many ways the evolution of data is emulating our own human evolution — only at a much faster pace. Considered in this way, we are at the data version of a “Cambrian explosion,” that extended moment roughly 540 million years ago when complex, multicellular organisms first began to appear. Life itself is actually much older, by at least 3 billion years. But before this rapid period of evolution, most organisms were relatively simple, composed of individual cells, or tiny multicellular organisms. And all of them lacked a nervous system.
My point here is that if we can date the origin of data to the Sumerians more than 5,000 years ago, we can equate the past two decades of our data evolution to the Cambrian explosion. Just as the Cambrian explosion yielded a burst of new life forms, data is now shaping a future that looks nothing like that past. And the more data we produce and capture, the more data we create in the form of metadata, or data about data.
How the ‘Metaverse’ Lacks Both a Brain and Nervous System
Our unicellular, protozoan databases have evolved into a kind of multicellular, metazoan means of data storage. But this ecosystem, though fast evolving in complexity and diversity, is still primitive and disconnected. It is characterized by hoarding, isolation, and fragmentation. Our much vaunted realm of data, the “metaverse” as it were, still lacks a nervous system. And equally important, it lacks a brain.
Which is what data scientists around the world, including those of us at data.world, are working on. The nervous systems and brains of data do in fact exist today. We call the nervous system the “data catalog” and the brain the “knowledge graph.” So I’ll twist that famous aphorism about the future from one of my favorite science fiction authors, William Gibson: The nervous systems and brains of data are here. They are just not evenly distributed. Which, in a nutshell, is our challenge: How do we resolve this distribution problem to achieve true corporate and collaborative cognition?
It’s not easy. I was encouraged in my belief in this new “Cambrian Age” a few months ago when eyewear startup Warby Parker effectively illustrated the point with its $6.7 billion its IPO debut. This success reflected the fact that decade-old Warby Parker is less an eyewear retailer than it is a master of data driving the optometrical universe — like Airbnb leveraging data in lodging, Oscar in health insurance, or Spotify in music.
These are new forms of companies, they are marvels, and they are harbingers of the future. These successes, however, should not obscure the more discouraging fact that most companies remain overwhelmed by data. In fact, among those leading the Fortune 500, 75 percent say they don’t believe their companies are data-driven, two thirds don’t regard data as an asset, and more than half say they are not yet driving innovation with data, according to the consultancy New Vantage Partners.
“When I think about the behavior of many business people today, I imagine a breadline,” wrote Tomasz Tunguz in the seminal book Winning With Data. “The employees are the data-poor waiting around at the end of the day on the data breadline.”
Executives don’t get what they need. Often, they don’t even know what they need. IT departments rush between data silos to the point of exhaustion. A staggering 70 percent of data engineers say they are likely to quit in the next year, according to an October 2021 study conducted by Wakefield Research and co-sponsored by data.world and DataKitchen. Teams literally brawl over decision-making in the absence of accessible and verifiable data. The pandemic has exacerbated this, siloing people along with the already-siloed data.
Another view on this comes from the towering thinker on our emerging data-driven civilization, Kevin Kelly, who I mentioned above. In addition to his role at Wired he is the author of books including The Inevitable. He dubs this inertia the “counter force” that contrasts with much outlying success. “… right now data tends to be hoarded like gold,” but to inadequate effect, Kelly writes. This is the conundrum of our age and what needs to come next is cognition.
Cognition is the Next frontier and Will Define our New Data-Driven Reality
For humans, cognition is, of course, the acquisition of knowledge and understanding through thought, experience, and the senses. Data enables the digital equivalent — within and between enterprises. At the end of the day, what we provide our enterprise customers and community members is simply a new, innovative form of cognition — corporate cognition for our enterprise customers, and collaborative cognition for our 1.5 million-plus community members who use our platform to confront the ills note at the outset of climate change, poverty, COVID-19, and more.
Our human cognition results from a healthy nervous system of the sensory inputs of sight, smell, touch, hearing, and vision, and the motor outputs enabling motion, breath, and organ function, all interacting through the control and command of the brain. A data catalog is, in many ways, the central nervous system of the enterprise; the sensory organ that connects sales, customer service, marketing, development, IT, HR, finance, supply chain, operations, and accounting.
A knowledge graph, meanwhile, is the semantic architecture of meaning and reasoning; it is the learning organism to which users apply all manner of data ontologies and taxonomies. Like the human brain, it gets smarter and smarter. As the brain of your enterprise, it powers the motor functions of your operations through the data catalog. Knowledge graphs are what power Google, Facebook, and a large part of Amazon.
The concept of a knowledge graph dates back to the 1980s, and early academic research on semantic, or concept-connecting, networks. The notion of a data catalog — an inventory of data and particularly metadata — originated before that, tracing to the creation of SABRE, the airline industry’s reservation system that debuted in 1964.
At data.world, we have evolved and woven these tightly together, into a seamless platform that allows users to query it, to execute upon it, and to make well-informed decisions once impossible. The results are highly collaborative, thinking organizations — far more efficient and effective than their competitors.
“We now do in half a day what we couldn’t do in six months,” Michael Murray, the former president of the data division at the global digital agency Wunderman Thompson, told us.
In sum, we call this catalyst of cognition, “Agile Data Governance.” But this is only the start of unwinding the conundrum of data. To bend Winston Churchill’s famous description of Russia, data is a riddle, wrapped in a mystery, inside an enigma. On one hand, data’s function is as apparent and concrete as the four basic laws of physics, expressed in ones and zeroes. But on the other hand, as the base of our emerging digital civilization, data’s nature is subjective and elusive.
This is the conundrum we are resolving at data.world.
Sticking by my analogy with human evolution, we’ve progressed through the Cambrian explosion-like burgeoning of data to the flourishing of new commercial, cultural and social forms. We moved from there toward the beginnings of a planetary organism, with the emergence of data’s nervous systems and brains, which deliver coherence.
That analogy now moves to what I call the tale of two “evolutionary last miles.” The first is the human one in the ongoing Darwinian journey from the primordial sea through that Cambrian explosion of species a half billion years ago. We started on this particular “last mile” a mere 40,000 years ago when, through a genetic mutation, we acquired our human-defining cognitive agility. While the science is unsettled, and discoveries reveal more each year, there is consensus that it was this moment when we homo sapiens became unique. Since the start of this initial “last mile,” our brains and nervous systems have effectively been those we have today.
Our second “last mile,” however, will do more than define humans. Still ahead of us, the deluge of amorphous data that is our new primordial sea will define humanity. Coming far sooner than 40,000 years is a new kind of agility: the brains and nervous systems of data that will give us cognitive enterprises and cognitive collaboration at planetary scale.
Some have likened my second “last mile” to a “phase transition,” physicists’ term for the transformation of, say, an H2O water molecule from a solid at 31 degrees Fahrenheit to a liquid at 32 degrees, and then again to a vapor at 212. Small increments of change; profound cumulative implications.
It is hard to exaggerate the sweeping scope of this data-driven phase transition ahead. Which is why we must match that scope of change with commensurate tools. These tools are not the science fiction of the imagined “AI singularity,” an upload of our very consciousness to some computational cloud. Rather, what lies ahead is the very human-centric, cognitive collaboration which at data.world we call Agile Data Governance.
A bit more context to my analogy: We are far from the largest among our mammalian relatives, a distinction belonging to the blue whale. Among our closer cousins, the primates, we’re not the fastest; the patas monkey of central Africa can run at speeds up to 35 miles per hour. We share 98 percent of our genes with gorillas, but we certainly can’t match their ability to lift 10 times their body weight. All we’ve got going for us is our brains, forty millennia old. And this is the model by which we must now evolve the brains of our data governance — in this next “last mile” to a data-driven civilization.
So back to the ascent of our species and our competitive advantage that springs from the complexity of our brains, our cognition. This is the supreme prowess that leaves those other attributes in the dust. Certainly other creatures have skills, consciousness, and the ability to adapt. The ancient Greek storyteller Aesop observed a crow using its beak to drop pebbles into a pitcher, raising the water to the level from which it could drink. Chimpanzees use simple tools, can learn to communicate with sign language, and form alliances with others in their troop. Many animals that we eat, and upon which we impose great suffering, have deep emotional and social structures. This is one of the main reasons I’m vegetarian.
But human cognition, our differentiator, traces back to only a moment ago in our evolutionary timeline. It is this trait that allows us to move beyond the virtues of size, speed, and brawn to acquire knowledge and understanding through experience, and then convert that insight to reason, creativity, and innovation. In turn, it is cognition that has given rise to the so-called “Anthropocene,” our current epoch defined by the impact of humans on geology, the ecosystem, and particularly climate change.
It is well established that this Anthropocene, a term for our age invented only in the 1960s, has given us the burgeoning of data, the new cellular structure of human-made 21st civilization. What is less established, even scarcely understood, is that we are only in the earliest moment, akin to that late nanosecond in our evolution 40 millennia ago that gave us cognition.
The challenges ahead of us on this metaphorical “last mile” are those we encounter every day when we open our news feeds: From a panel investigating disinformation, to commentary on the sins and virtues of social media. From data stolen and held for ransom, to the threats of digital surveillance. From the promise of AI-enabled vaccines, to the crisis of supply chains. And infinitely more.
What’s missing from the headlines, however, is discussion of cognition. Keeping us from what I’ve called this “last mile” is the fact that our attention remains riven by the attributes that predate our fast-emerging, but incomplete, brains and nervous systems. Facebook with its three-billion-plus users is the blue whale. A click on the gorilla Amazon will deliver a six-foot-tall, 1,500-pound safe to your porch. Tesla’s new 1,020 horsepower Model S Plaid, along with its 740 percent valuation growth last year, makes the company hands down our patas monkey. But while we often talk about the size, brawn, and speed of these titans of data, we skip meaningful discussion of their means of data harvest and use — their use of brains and nervous systems.
Just Why the CEO’s Smile with the Grin of the Poker Player
“The CEO’s sit and listen to my talks and smile,” writes Scott Galloway, the author who explores this neglect in The Four, a marvelous book on the titans of Amazon, Apple, Facebook, and Google. “It’s the smile of poker players holding aces. And every one of those aces is data. In the last decade, the world’s most important companies have become experts in data — its capture, its analytics, and its use.”
But ironically, those “aces” they hold — in the data.world lexicon, a form of Agile Data Governance — are also in the hands of enterprises large and small. Or could be.
Every product team could be refining its creations by the day, or tracking customers’ usage moment by moment. HR departments could be building their recruitment plans and strategies with insight from the R&D team’s work in progress. In the public sphere, the Centers for Disease Control and Prevention, the CDC, should not be flying blind — as it often is — in our fractured system of public health. A revolution in learning awaits schools that could turbocharge teaching with individualized skills diagnostics to replace obsolete testing like the SAT exam, not fundamentally changed since being introduced in 1926.
In tech-savvy Austin, we could match our peers in most data-laced “smart city” categories, better managing some of the nation’s worst traffic congestion. As America readies for a $1-trillion-plus infrastructure rebuild, we could be rebuilding our precise knowledge of our roads, rails, and bridges, data that is as decrepit as the century-old tunnels funneling 200,000 commuters each day beneath New York’s Hudson River.
That we are not prompts me to recall that famous quip by the polymath Stewart Brand: “On the one hand information wants to be expensive, because it’s so valuable… On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.”
Today, on the one hand data wants to be hoarded because of its immense value, which is why we so often hide it. But on the other hand, data wants to be shared across teams, between institutions, and among individuals because of its transformational power. Now, these are the two imperatives at war.
Consider that globally, companies spent $2.1 billion last year on data governance, a number forecast to grow to $6 billion by 2025, according to MarketsandMarkets. That’s healthy. Until you consider what we will spend next year scrambling after the failure of data governance — more than $170 billion, according to Gartner.
I rest my case.
I’m proud that our cloud-native SaaS data catalog and platform makes peace among the clashing factions. I’m proud that Gartner, Forrester Research, and others continually rank data.world among the best and most comprehensive solutions in the data catalog and governance sectors. I’m also proud that this was foreseen by technology journalist John Battelle, the founding editor of the Industry Standard, who wrote about our launch back in 2016.
“… data.world sets out to solve a huge problem — one most of us haven’t considered very deeply. The world is awash in data, but nearly all of it is confined by policy, storage constraints, or lack of discoverability,” Battelle wrote of us in our infancy. “In short, data.world makes data discoverable, interoperable, and social. And that could mean an explosion of data-driven insights is at hand.”
No longer in our infancy, we’re not just taking steps toward the “last mile” of data evolution. We’re sprinting.
Join our movement, either on our team or as a partner or customer, and let’s make history together! The world needs us to solve some of the most challenging problems, both inside and outside of companies and organizations, with data. It’s our destiny to do so — it is what we are meant to do as the human race.