What if your website talked back to agents?

Most personal sites are barely parseable by agents. Here's the three-layer discovery stack I added to mine — the foundational layer, the well-known layer, and the active WebMCP layer — and why I think it's the more interesting alternative to llms.txt.

June 16, 2026· Updated July 8, 2026· 4 min read#generative ai #geo #agent discovery #webmcp

Credit where it's due: I didn't invent any of this. The AI Catalog spec is a Linux Foundation effort by the A2A and MCP protocol communities. I'm just an early adopter with a personal site and too much spare time.

Most personal sites are optimized for speed, SEO, and accessibility amongst other things. They render fine for a human, but an agent trying to summarize who you are and what you've written has to parse the HTML, summarize the content, fill in the blanks and the structure.

I wanted mine to be different. Not because I expect a flood of agent traffic, but because I am optimizing for both human and non-human organic traffic. I can't remember the last time I did a web search myself for technical documentation, I sometime visit a website directly to confirm documentation after my agent hydrates the context with it.

The foundational layer first

Before any of the agent-specific stuff, the site ships the obvious things:

Sitemap XML files, robots.txt, etc
Annotation like JSON-LD for entity schemas
Proper HTML tags like titles, description etc

None of this is new. All of it still matters. The foundational stuff is non-negotiable because agents and LLM orchestrators still use web search and they can only take in so much context so you still need to account for proper SEO hygiene, unless someone creates an agent search engine (idea)

The LLM/Agent layer

On top of the foundational stuff, I added two well-known endpoints:

/.well-known/ai.txt which is a human readable policy: who I am, what's allowed, what isn't about the content
/.well-known/ai-catalog.json is a machine readable catalog of AI-related resources for the site. It can advertise AI policies, content feeds, licensing information, MCP servers, agents, APIs, models, and other machine consumable resources so that AI systems can discover them from a single well-known endpoint. The catalog is the part I find most interesting because it allows us to use a one well known URL, many discoverable resources, no need to redefine the format every time I add something new.

The active layer: WebMCP

The well-known layer tells an agent what exists. The active layer lets it call what exists, from inside the page, without a backend integration.

WebMCP is a proposed web standard (Chrome origin trial in 2026, W3C explainer here) that lets a page register client-side "tools" on document.modelContext. Any browser-side agent — built into the browser, in an extension, in an iframe — can discover them and invoke them. Same vocabulary as backend Model Context Protocol tools, but running in the tab instead of a separate server.

My site registers six read-only tools:

get_author_info — canonical Person profile with the same @id as the JSON-LD graph
list_topics — the topic clusters I write in, each with canonical sameAs URLs
get_recent_posts — newest-first list with each post's BlogPosting @id
search_posts_by_topic — filter to a cluster; accepts slug or display name
get_post — full markdown body of one post, pulled from the Atom feed
get_ai_policy — license, training, and attribution rules as structured data

If you have Chrome 149+ with the WebMCP flag on (chrome://flags/#enable-webmcp-testing), you can paste this into DevTools on any page of the site:

const tools = await document.modelContext.getTools();
const author = tools.find(t => t.name === "get_author_info");
const raw = await document.modelContext.executeTool(author, "{}");
console.log(JSON.parse(JSON.parse(raw).content[0].text));

What I find interesting isn't the API surface — it's what happens when the three layers agree. Every tool response includes the JSON-LD @ids from the page's application/ld+json graph, so an agent reading both surfaces sees one connected entity graph, not two disjoint views. get_author_info returns the same Person @id the BlogPosting nodes reference. get_post returns the same BlogPosting @id the ItemList on /blog links to. Same identities, three delivery mechanisms.

The catch: WebMCP is browsing-context-only. Headless crawlers won't see it. That's why it complements the well-known layer instead of replacing it.

Why not just llms.txt

llms.txt is fine as well but it bakes everything into one flat document and assumes the only thing an agent wants is a curated reading list.

The catalog approach is more setup up front, but it scales the way a sitemap does. one entry point, a list of resources behind it, and adding a new one later (an MCP server, a model card, an API) doesn't break the contract.

Neither is a standard yet. Both are bets. I'm betting on the catalog because it doesn't lock me in, if llms.txt ends up winning later I can just advertise it from the catalog and move on.

What I'd tell someone copying this

Ship the foundational layer first. The well-known layer is a bonus on top of it, not a replacement for it. The active layer (WebMCP) is a bonus on top of that — only worth it once your site actually has something an agent might want to call. And don't hand-write any of these files; generate them from the same source of truth that feeds your sitemap and your feed.

The takeaway

We'll see which approach wins. Shipping the foundational layer is table stakes, shipping the well-known layer is cheap, shipping the active layer is a bet on where browsers are headed. Not shipping anything means an agent gets to decide who you are, and it may decide wrong.

ShareLinkedIn Copy link