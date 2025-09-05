Cloudflare’s CEO, Matthew Prince, has been outspoken about the way artificial intelligence (AI) companies are harvesting content. His argument is simple but important: AI vendors are scraping websites, including those of small publishers like Petri, and using the content to train their models.

That same content is then re-served to users in the form of AI-generated answers, often without the user ever visiting the original site. The end result is that publishers can’t monetize their work, even though their work is being used.

Many serve content for free because the business model depends on advertising, subscriptions, or partnerships. But these are models that rely on people visiting the site. But what happens if users never click through because the search engine or AI assistant gives you the answer on the spot?

A new Internet model powered by natural language, vector databases, and semantic search?

Cloudflare is now working with Microsoft on a possible solution. The initiative revolves around two technologies: Microsoft’s new NLWeb standard and Cloudflare’s AutoRAG feature.

What is NLWeb?

NLWeb, short for Natural Language Web, is a standard developed by Microsoft that allows websites to provide natural language answers directly on their own pages. Think of it as a built-in “answer engine.” Instead of relying on Google or Bing to summarize your content for a user, NLWeb could allow the site itself to handle that request.

For example, if you typed into a site’s search box: “What’s the difference between Windows 11 Pro and Home editions?”, NLWeb would allow the site to generate a concise, AI-style answer from its own content. In other words, the traffic and engagement stay on the site that created the information, rather than being siphoned off by an external AI model.

What is AutoRAG?

The second piece of this puzzle is Cloudflare’s AutoRAG (short for automatic retrieval-augmented generation). Now in beta, it automatically indexes a website’s content and stores it in a vector database.

Embeddings and vector databases

Instead of creating a simple index of keywords, AutoRAG creates mathematical representations of unstructured content like text, images, or even video, called embeddings. These are then stored in a vector database. By converting unstructured data into numbers, the system can identify relationships between concepts.

Semantic search in Azure AI Search (Image credit: Microsoft.com)

This is the foundation of semantic search. For example, if you searched “holiday pictures” in a traditional keyword search, you’d only find pages with those exact words. But semantic search understands meaning: it might also surface “vacation photos” or “travel albums” because it recognizes they are related ideas.

Cloudflare’s AutoRAG takes care of maintaining a vector database for your site. That’s significant because building such a system yourself requires technical expertise and infrastructure. By offering this service, Cloudflare lowers the barrier to entry.

Microsoft’s NLWeb standard in action (Image Credit: Microsoft.com)

But do SaaS answer engines fix the monetization problem?

The bigger question is whether these technologies truly solve the monetization challenge. Blocking AI bots from scraping content is difficult, and while Cloudflare offers tools to do this, it’s not clear how effective they are. AI vendors often don’t play by traditional copyright rules, and smaller sites may not have the resources to fight back.

Even with NLWeb and AutoRAG, users still begin their search journeys in Google or Bing. Or increasingly in tools like ChatGPT. If the answer they want is already served up there, will they ever make it to the original site?

The Internet is shifting from a model of click-through discovery to one of answer delivery. For many websites, that threatens the business model that funds free, independent content.

What does the future hold for content creators?

Over the last year alone, we’ve seen shifts in how people access information. The winners in this new landscape will likely be the sites that create experiences compelling enough to bring users directly to them, rather than relying on Google search for discovery.

NLWeb and AutoRAG are interesting experiments in reshaping that future. But the larger challenge is ensuring that creators can still be compensated in an era where AI models “borrow” their work.

Could we imagine an Internet without traditional websites, replaced entirely by answer engines like ChatGPT? It’s possible. But for now, websites still serve as the shopfronts of the digital world: business portals, information hubs, and communities. Whether technologies like NLWeb and AutoRAG can keep that shopfront relevant in the age of AI remains to be seen.