Methodology

Generative engine optimization for healthcare

The technical framework we use to rebuild clinic websites for ChatGPT, Perplexity, Gemini, and Google AI. Every step, every artifact, every measurement.

By Kailesk · · 12 min read

Generative engine optimization is the discipline of making a website easy for a language model to read, cite, and present as an answer. It is to AI search what classical SEO was to the ten blue links. The term is new. The underlying practice has three years of field data behind it.

This essay documents the framework we use at KailxLabs to rebuild clinic websites for citation across ChatGPT, Perplexity, Gemini, and Google AI. It is opinionated. It is specific. It is the same framework we walk every engagement through, from the initial audit to day 45 proof.

The three layers

Every decision in the framework sits in one of three layers. We work top down.

Layer 1. Retrieval readiness
Can a language model crawler fetch and parse the full page in a single HTTP request?
Layer 2. Semantic clarity
Once parsed, can the model identify what the clinic does, who it serves, and what each page means?
Layer 3. Citation optimization
Are individual sentences written in a form the model will quote when a patient asks?

A clinic that fails at Layer 1 is invisible. No amount of Layer 2 or 3 work will rescue it. A clinic that passes Layer 1 but fails Layer 2 may appear in a few answers by accident but cannot be optimized. A clinic that passes Layers 1 and 2 but fails Layer 3 will be cited occasionally but not preferentially. All three layers must hold.

Layer 1. Retrieval readiness

The work at this layer is entirely technical. It is invisible to the clinic owner when done. It is devastating when absent.

Server rendered HTML, not client rendered

Every page must return its complete content in the first HTTP response. This rules out single page React or Vue apps that mount empty divs and load content through XHR. It rules out Wix and Squarespace sites that render critical content through JavaScript widgets. It rules out any CMS theme that defers the hero, the body copy, or the provider names to post paint rendering.

The test is binary. Run curl against the URL. If the response contains the headline, the treatment copy, the provider names, and the pricing in plain text, the site is retrieval ready. If the response contains <div id="root"></div> and a bundle of script tags, the site is not.

The stack we ship on is Astro in server side rendering mode on Vercel. The same outcome is reachable with Next.js static export, with plain HTML, with Hugo, with server rendered WordPress using a fast theme, and with several other tools. The tool does not matter. The output does.

Crawl budget respect

Every page must load fast enough that a crawler under rate limits will complete the fetch. Language model crawlers rate limit aggressively and abandon slow responses. A page that takes four seconds to return the first byte will often not be read.

Practical target. Time to first byte under 400 milliseconds. Full HTML transferred under one second. Total page weight under 400 kilobytes before images. These are not user experience numbers. They are crawler tolerance numbers.

robots.txt that explicitly invites AI crawlers

The major AI crawlers identify themselves by user agent. GPTBot (OpenAI browsing). ClaudeBot (Anthropic). PerplexityBot (Perplexity). anthropic ai (Anthropic older agent). Google Extended (Google AI training). Many clinic websites inherit a default robots.txt that does not name these agents. Some sites inadvertently block them.

The robots.txt must name each AI crawler explicitly with an Allow: / directive. This is documentation. It is not required for crawling. It is required as a trust signal and as a public record that the clinic welcomes citation.

llms.txt at the root

Place a markdown file at /llms.txt that summarizes the site for language models. The content is a concise description of the clinic, the services, the geography, and a list of the canonical pages a model should quote. This is the closest thing AI search has to a sitemap written in its own voice.

The Answer.AI proposal published in September 2024 defines the structure. We ship a populated llms.txt on every clinic site we build. Adoption across engines is uneven but growing. The file costs nothing to ship and changes the conversation when a model reads it.

Layer 2. Semantic clarity

The work at this layer is structural. It transforms a readable page into a legible page.

Schema.org @graph, comprehensive

Every page ships with a JSON LD block that declares every relevant entity. On the homepage that means a MedicalClinic, a LocalBusiness, a WebSite, and a Person for the founding provider. On a treatment page that adds a MedicalProcedure or Drug entity with eligibility, dosage, and pricing. On an FAQ page, a FAQPage. On a how to page, a HowTo. On a review page, an AggregateRating if reviews exist.

Each entity is linked using Schema.org @id references so the graph is connected. A MedicalClinic references its LocalBusiness, which references its PostalAddress, which references its GeoCoordinates. An engine reading this graph can reconstruct the clinic as a complete object without guessing.

Semantic HTML with correct element use

Heading tags describe hierarchy. One h1 per page, declaring the page topic. h2 for major sections. h3 for subsections. Lists are ul or ol. Pricing is a table or a dl. FAQ is a series of h3 questions followed by p answers. Provider bios are wrapped in article elements.

The purpose is not validator compliance. The purpose is to give a language model the same structural cues that a human reader takes from visual design.

Named entities on every page

The clinic name appears as text in the first 100 words of every page. Not only in the logo image. Not only in the page title. In the body copy. The same applies to provider names, drug names, procedure names, and city names. Named entities that appear only in images or as CSS background text are invisible to language models.

Canonical URLs, absent duplication

Every page has a single canonical URL. Marketing preview versions, A B test variants, and tracking parameter duplicates must canonicalize to the clean URL. An AI engine that finds three versions of the same page picks one and discards the others. You want to control which one it picks.

Layer 3. Citation optimization

The work at this layer is editorial. It transforms a legible page into a citable one.

The answer paragraph pattern

Every page that targets a patient query opens with a direct, standalone answer paragraph. Two to four sentences. Plain language. Named entities present. Facts stated, not implied. This paragraph is the primary citation candidate.

A semaglutide treatment page opens with a paragraph that reads as a complete answer to the query what is a semaglutide program at this clinic. It states what the program is, who it is for, how long it runs, what it costs, and what the patient receives. If that paragraph is missing, the engine is forced to piece the answer together from scattered sentences, which it will do reluctantly and imperfectly.

Fact density, not word count

Long pages do not win citations. Dense pages do. A 600 word treatment page with twelve stated facts (pricing, frequency, dosage range, provider credentials, eligibility, safety screening, follow up cadence, expected outcomes, common side effects, exclusions, alternatives, booking path) will outperform a 2400 word page that restates the same twelve facts across padded prose.

The measurement we use is facts per hundred words. Above six is citation grade. Below three is filler. Most clinic sites sit at one or two.

Direct quotation friendly phrasing

Sentences that stand alone get quoted. Sentences that require context do not. The editorial discipline is to rewrite every sentence on a key page so that a reader who encountered it in isolation would still understand it.

Before. Our program takes a personalized approach based on your individual needs.

After. Our semaglutide program runs twelve weeks, includes weekly injections under physician supervision, and costs $299 per month including all follow ups.

The after version is quotable. The before version is filler. Both look fine to a human reader. Only one survives retrieval.

Programmatic city and treatment pages

Patient queries are local and specific. Semaglutide provider in Austin. Tirzepatide clinic in San Diego. Weight loss clinic near me. A clinic with one page for semaglutide and one page mentioning three cities cannot win these queries.

The structural fix is programmatic. Generate one page per city, per treatment, with unique content. A clinic serving three cities and offering two treatments ships six city by treatment pages. Each page has a dedicated URL, a unique title, city specific content, local schema, and a path to the main treatment page.

We seed fifty programmatic pages on every build. The seed is rarely the final set. It is enough to signal local relevance to every engine from day one.

What the framework delivers, by the numbers

100%
of delivered sites pass curl readability on day one
6+
Schema.org entity types in the @graph on every page
50
programmatic city by treatment pages seeded at launch

Measurement, across forty five days

The only valid test is live. Citation does not show up in Search Console. It shows up in real prompts on real engines. The measurement loop we run on every engagement.

  1. Baseline. Before build, run five patient queries on four engines. Record whether the clinic appears. For active clinics, the baseline is usually zero on zero.
  2. Week one after launch. Same query set, same engines. Look for early ChatGPT and Perplexity appearances. These move fastest because they retrieve live.
  3. Week four. Expect citation breadth across ChatGPT and Perplexity. Gemini and Google AI typically lag by two to four weeks because Google reindexing is slower.
  4. Day sixty. The guarantee milestone. The clinic is cited on at least two of the four major engines for the agreed query set, or the fee is refunded in full.

What this framework does not do

It is as important to be specific about what the framework does not promise. Honest limits reduce wasted engagements.

It does not generate patient volume by itself. Citation is the top of the funnel. The clinic still needs a working intake flow, a provider who answers consult calls promptly, and a price that matches the local market. A brilliantly cited clinic with a dead phone line will not grow.

It does not replace paid advertising for immediate results. A clinic that needs patients this week should still run Meta ads or Google ads. The AI foundation is a four to eight week ramp. It compounds forever after. It does not deliver overnight.

It does not survive a neglected domain. A clinic that ships a perfect AI native site and then never updates it will see citations plateau. The foundation accelerates content. It does not replace content.

It does not bypass medical accuracy. Every clinical claim on the site must be defensible. AI engines retrieve what is written. If a treatment page states an incorrect dosage, that incorrect dosage will be quoted. The editorial review by the clinic medical director is a requirement, not a nice to have.

Why this matters for clinics, specifically

The shift in patient behavior is real and measured. Sixty one percent of Google click through rates disappeared on queries that triggered AI Overviews, according to Seer Interactive data published in September 2025. Forty eight percent of Google searches now trigger an AI answer. Two billion people use ChatGPT every month.

Patients research before they book. A patient deciding between three semaglutide clinics in Austin reads, on average, seven pages across four different tools. The clinic that appears on all four tools wins the default position. The clinic that appears on one tool competes on price.

The economics of AI visibility are better than the economics of paid ads. A single well built clinic site, cited across four engines, produces compounding inbound for years. A single month of equivalent Meta ad spend produces a pulse and then silence.

The work is specific, bounded, and measurable. The framework above is how we do it.

Next in this series. How ChatGPT decides which clinic to cite goes deeper on retrieval mechanics. Why most clinic websites are invisible to AI in 2026 documents the failure patterns we see across audited clinics.