The 4 Billion-Year History of AI’s Large Language Models

9 min readNov 30, 2023

The aggregate of all of the information in all of our brains and across all our books, and everything publicly found on the internet, is now the genome of a new creature– ‘Agora’

By Byron Reese, author and entrepreneur, and Brett Hurt, CEO and Co-founder of data.world

ChatGPT burst into the popular consciousness just one year ago: today is its anniversary. In that short time, more than 180 million people have become ChatGPT users, and it’s already on track to generate well over a billion dollars in revenue. Along with cousins like Bard, Inflection, Claude, and others, these are impressive technologies that deserve the accolades being heaped upon them. But it would be a mistake to think of this current wave of innovation as something entirely new. Rather, this is just the next chapter in a long story about the relationship between life and information that began eons ago, shortly after the formation of the Earth.

These two topics — life and information — at first glance don’t seem to have anything in common. But upon closer inspection, they are revealed to be profoundly connected. Life requires information in order to form, and likewise, information requires life in order to propagate. It is only by understanding this connection that we can see clearly just exactly what these large language models (LLMs) really are and how they will ultimately impact our lives. Against this more expansive backdrop, the resolutions to the many debates about the role of AI in our society suddenly become clearer.

But we are getting ahead of ourselves. Let’s take the advice of the King of Hearts in Lewis Carroll’s Alice’s Adventures in Wonderland and begin at the beginning. Four billion years ago, give or take, life appeared on this planet. It seems to have only done so once or, at least, persisted only once. How do we know this? Because all life today — from mildew to millipedes — stores its genetic information exactly the same way, using the exact same alphabet and encoding it on exactly the same molecule, DNA.

DNA isn’t alive. It’s just information. Think of it as a book full of words made up of just four letters, G, T, C and A. In humans, that book is a few billion letters long, which, in modern parlance works out to about 700 megabytes of data. Such a small amount that it would easily fit on a five dollar flash drive.

Thinking of DNA as a data storage device such as a flash drive isn’t just a metaphor — that really is what it is. And for four billion years, DNA was the only data storage medium in existence. All information on the planet was stored there. While it’s pretty reliable, it really doesn’t hold all that much data and adding a new piece to it takes millennia.

Because all of life is based on DNA, and that was the only data storage method in town, all forms of life were on roughly equal footing. That is, until brains evolved. They stored data too, but could hold vastly more than DNA, and even more importantly, new information could be added quickly via a process called learning. It no longer took ten millennia to add a piece of knowledge, instead it took just a few seconds.

At that point, about 500 million years ago, life now had two places it could store information: DNA and brains. Creatures with brains evolved much quicker, while ones without them still endured, but seldom changed all that much. That’s where things stood until about 50,000 years ago, when humans acquired language. This was a huge leap forward in how information could be propagated. With DNA, you passed down information to your offspring once a generation, but with language, you could pass information to others instantly. While it might take a creature 10,000 years to evolve an aversion to a certain poisonous berry, one human could say to another, “Don’t eat those little purple berries.” With language, information could spread through the species like wildfire.

Brains combined with complex language are the secret to our success as a species. Other species from dolphins to prairie dogs have the ability to communicate basic information with sound — as in “shark ahead” or “there’s a fox in the neighborhood.” But we’re unique in our language abilities, which give us collective memory of the past and the means to imagine the future. We could pass favorable mutations — that is, new knowledge — around instantly while other creatures had to hope they eventually evolved them through chance and dumb luck. Our genome was no longer simply written on DNA using four letters, but became written in our brains using thousands of words that could be combined in a near-infinite number of ways. Our mental evolution thus could happen millions of times faster than old-fashioned DNA evolution ever could. In the blink of an eye, we became the undisputed masters of our planet. And this was just the start.

About 5,000 years ago we learned a new trick: writing. We learned how to externalize information. It was no longer purely biological. It could exist independently of life, meaning that when someone died, no longer did everything they know die with them. Up until this point, knowledge had to be passed orally, which wasn’t all that reliable and required all parties involved to be in close proximity. With writing, information could be preserved in perfect fidelity for centuries. This meant that for the first time, information could accumulate. The human genome now consisted of what was in our DNA, plus what was in our brains, plus whatever writing we had access to. It’s no coincidence that this is the point when there was an explosion of human accomplishments. We entered the Bronze Age, invented the wheel, and created nation states with complex and nuanced legal codes.

But writing was an expensive and difficult way to store information, so very little could be saved. Sure, we kept what Plato knew and wrote it down on papyrus scrolls, but his Aunt Martha’s cure for lumbago — and a billion other useful facts — was lost to the ages. That is, until Gutenberg came along almost 600 years ago with his legendary innovation.

Inexpensive printing allowed us to store vastly more information. No longer did knowledge have to be carefully copied by hand. Now, it could be endlessly reproduced. Books were cranked out by the millions, and eventually by the billions, and civilization surged forward. This gave us the scientific revolution and supercharged the advancement of humanity even more.

Our genome was now DNA, plus brains, plus our libraries. But there was a problem with this last one. While libraries could store vast amounts of information, being able to find that information was quite difficult. Sure, there were improvements on the margins, such as when a young fellow named Melvil Dewey came along in the late 19th century to give us the now familiar “Dewey Decimal System”, making libraries vastly more accessible and efficient. But it still seemed like we had hit a wall with what information a person could have in their virtual genome — that body of collective knowledge one could actually access.

However, in the 20th century, we took another leap forward by developing a new place to store information: magnetic media. We developed a new language to do this. Not the four letters of DNA, nor the 26 we use in English, but just two letters: zero and one. We built machines to access and display this information. We connected those machines together into a vast network we call the internet. We built ways to search for and find that information with ever more granularity.

It’s a remarkable fact that the DNA of any two humans is almost identical. In that sense, there really is a single human genome. However, each of us has vastly different information stored in our brains. So the aggregate of all of that information in all of our brains and across all our books, and everything found on the internet became the genome of a new species, which Byron has named Agora in his upcoming book, after the noisy marketplaces of Ancient Greece that went by the same name. No one alive knows how to build a smartphone, yet smartphones get made, because Agora knows how to make one. No human could have put a person on the moon, but Agora was able to do it by virtue of its immense distributed genome.

Now, the limit to our knowledge is no longer the availability of information. Rather, it is assessing its accuracy and worth. Someone trying to figure out whether they have the flu or a cold might go to a search engine and find literally millions of pages that purport to have the answer. While search engines brag about the astronomical number of pages they serve up in a quarter of a second, they miss the point that sheer volume is useless. You just want to know the answer and instead they leave you on your own to separate the wheat from the chaff.

That brings us to the LLMs, such as OpenAI’s ChatGPT and Google’s Bard. What they attempt to do is to finally synthesize all human knowledge into one single source. In a sense, these new tools are not so much another layer of information technology, but the aggregation of all the zeros and ones created since that first bit of magnet tape. And not just from the Web — as profound as that is — but soon from the emerging data cloud and new sensors tracking virtually everything.

The promise — not yet the reality — is that posing a question, like “what’s the difference between the cold and the flu?,” will still draw on all those millions of pages but will also be able to ascertain which is accurate. As Ethan Mollick, a pioneer in AI in education at the Wharton School, has pointed out, the tendency of ChatGPT to “hallucinate” and confidently share inaccuracies may be frustrating, but we are now using the worst versions of LLMs we will ever experience. Think about how fast the mobile web experience progressed, for example (the first versions were quite terrible in hindsight.)

Mollick says AIs allow him to ask students to “literally do the impossible”. Now, projects of a scope they could never manage before are doable and learning is accelerating by leaps and bounds. Meanwhile, what is cutting edge today, making calculations in the millions of billions like GPT-4 (OpenAI’s latest LLM), will be open source within a few years at most, while the models will grow at 10X computational power each generation — GPT-5, 6, 7, and onward.

Another pioneer, Sal Khan, the founder and CEO of the online learning platform Khan Academy, has a vision to revolutionize education. As he communicated at this year’s TED conference, he sees a future where students from elementary school onward will have an interactive personal tutor and teachers will be empowered to hand off the tedium of lesson plans and test grading to AI and spend more time interacting directly with pupils.

To be sure, we live in a moment of daunting challenges. From climate change and the need to transition to a post-carbon economy, to the threat of new zoonotic disease, to inadequate health care for many, to the crisis of the unhoused, it’s a long list, frankly. But as with every other challenge humanity has faced, the answers will be found both in the access to knowledge as well as the creation of new knowledge. Right now, those answers are spread across countless billions of separate pages on the internet. But not for long. This is the promise of LLMs, to provide access to the combined wisdom of billions of people is the superpower — the synthesized single intellect — that will propel us forward as a species. Our path to solving our biggest challenges is in harnessing everyone’s knowledge and wisdom present in the Agora.

Of course, we must be prudent in how we proceed. Each advance we’ve discussed — DNA, brains, speech, writing, printing, the digital age, and all the rest, has brought disruption, and LLMs will as well. But our species is on cusp of a great leap forward the likes of which we can scarcely imagine. It’s amazing to contemplate how far we’ve come in four billion years. It’s even more amazing to contemplate how far we have to go. Our best is yet to come.

The 4 Billion-Year History of AI’s Large Language Models

Written by Brett A. Hurt

No responses yet