Strategic Analysis
AI Has A Content Gap. Who Will Fill It?
LLMs will never tell you they don't have an answer — they'll approximate. But approximation fails when the information simply isn't there. The companies building these systems know it, and they're willing to pay for what they're missing.
By its nature, an LLM will never tell you it doesn’t have an answer to your question. Based on its training, it will provide an approximate answer, but for some prompts, you need a precise answer. Cloudflare’s CEO and founder, Matthew Prince, rightly described the models as a block of Swiss cheese, full of holes. And this point wasn’t theoretical. These systems don’t know everything, and the companies building them are actively trying to fill in those gaps. They’re willing to pay for access to the information they’re missing.
The Cost of Missing Facts
If you’ve ever tried to rely on an AI system for something local, the limitation becomes obvious. If you ask for a restaurant recommendation in a small town, it might suggest a place that closed months ago, or miss a new spot all the locals are talking about. Planning a weekend outing? It won’t know about temporary street closures or a local event that changes everything.
In these cases, the issue isn’t interpretation. It’s absence. The information simply isn’t there. And when this happens, the output stops being useful, no matter how well the system is designed.
If something isn’t documented somewhere the model can access, it doesn’t exist for it at all. That’s why local journalism, niche coverage, and small-scale reporting matter more than they might seem. They don’t just inform their readers, they define what AI systems can know at all.
Critically, scarcity increases value. It might sound obvious, but this basic principle of free markets applies just as much to niche data. At a global level, the internet isn’t short on information. Major media outlets are often covering the same events, all presenting the same facts in a different tone tailored to their audience. But for AI, that is redundant. Once the underlying facts are available, a model can repackage them in countless ways. It doesn’t need ten versions of the same story; it needs one reliable account.
And this is where small but unique data sources can find a market in an evolving environment. If the NYT won’t license its data to an LLM, you can just go to WSJ and repackage the story in a way that perfectly matches each individual’s preferences. But if the sole newspaper covering a small tourist town won’t let you use their data, it leaves an unfillable gap.
Are Facts Enough?
If we were to imagine a world where every factual gap is filled, every event documented, and every update recorded, we still have a problem. Because sometimes, no matter how you interpret the facts, they can’t answer real questions.
Think of a restaurant. You can have all the data about menu items, pricing, nutritional facts, and still not know whether a restaurant is worth visiting. Subjective reviews might not be precise, but they’re exactly what people use to decide.
This information doesn’t come from spec sheets; it comes from people. From Reddit, from Google reviews, from niche blogs, from YouTube videos. These sources are messy, subjective, and sometimes contradictory, but they capture the human experience. And in many cases, they are the only place that information exists.
Fortunately, LLMs and AI systems are excellent at processing information. They can identify patterns across thousands of opinions and find the details relevant to you. As long as they have a constant supply of human experience, documented in language, they can simulate the understanding that comes from a location visit, a product trial, or a lived situation.
Can You Fill the Gaps?
As our decisions on what to buy, where to go, and what to choose are increasingly mediated by AI, the quality of those decisions depends on what the system has access to. And that, in turn, depends on what people have taken the time to document.
AI companies are committing billions of dollars per year into the access and licensing of unique data for their LLMs. These are not one-time expenses; the world is constantly changing, and they need to understand precisely how.
The internet used to reward whoever could produce and distribute the most content. The next version may reward something else: the people who fill in the gaps in both what we know and what we experience.
This article was inspired by the SXSW session “The Internet After Search” with Matthew Prince, and some sections may paraphrase his words. It was recorded and is publicly available, and I highly recommend watching it for insights into the future of agents and the internet.