Back to Insights
AI & Search 8 min read February 2026

How AI Search Decides Which Websites to Cite

TL;DR

AI search tools like ChatGPT and Perplexity choose which websites to cite based on clear structure, factual claims, and authority signals — not just backlinks and keywords. This article explains what AI systems actually look for and how to make your site more likely to be referenced.

Something fundamental has shifted in how people find information online. ChatGPT has over 200 million weekly active users. Perplexity processes millions of searches daily. Google's AI Overviews now appear on a significant and growing share of search results. For a growing proportion of your potential customers, the first encounter with your business won't be a blue link on a search engine results page. It will be a sentence generated by an AI model that either cites your website or doesn't.

The critical question isn't whether AI search matters. It's whether your website is structured in a way that AI systems can understand, verify, and choose to reference.

How AI Search Actually Works

Traditional search engines like Google work by crawling the web, building an index, and ranking pages against a query using hundreds of signals — backlinks, keyword relevance, page speed, user engagement. The result is a ranked list of pages.

AI search engines work differently. Most use a technique called Retrieval-Augmented Generation (RAG). The process has two stages:

  1. Retrieval — The system searches its index (or the live web) for documents that are relevant to the user's query. This step is similar to traditional search but often prioritises structured, well-annotated content.
  2. Generation — A large language model synthesises the retrieved documents into a coherent answer. It decides which sources to cite, which claims to include, and how to attribute information.

This two-stage process means that your website needs to pass two filters: it needs to be found during retrieval, and it needs to be chosen during generation. Failing either stage means your content doesn't appear in the AI's response.

What AI Systems Look At (That Traditional SEO Doesn't Cover)

Traditional SEO optimises for Google's ranking algorithm. AI search optimisation — sometimes called Generative Engine Optimisation (GEO) — requires a different set of signals. Here's what the research and practical evidence suggest AI models prioritise:

1. Structured Data and Schema Markup

AI systems are better at understanding structured data than unstructured prose. Schema markup (JSON-LD) provides explicit, machine-readable descriptions of what your page is about, who wrote it, when it was published, and what entities it discusses.

A page with proper Article schema, author markup, datePublished, and BreadcrumbList is significantly easier for an AI system to parse than a page with the same content but no structured data. The AI doesn't have to guess — the metadata tells it directly.

2. Semantic HTML and Clear Content Structure

AI systems parse HTML. Pages with clear heading hierarchies (h1 through h3), semantic landmarks (header, main, nav, footer), and logically structured content are easier to extract information from than pages built with generic div elements and CSS-driven visual hierarchy.

When an AI model encounters a well-structured page, it can identify the main topic (from the h1), the key subtopics (from h2 headings), and the supporting evidence (from the content under each heading). This hierarchical clarity makes your content more likely to be retrieved and cited.

3. Authorship and Entity Signals

AI models increasingly weight content by who created it, not just what it says. This is Google's E-E-A-T framework (Experience, Expertise, Authoritativeness, Trustworthiness) applied to AI retrieval.

Practical signals include:

  • Person schema linking the author to a profile page
  • Author bylines with credentials and role descriptions
  • Consistent author entities across multiple pages
  • Organization schema linking to the publishing entity
  • An /about page and /team page that establish who you are

4. Evidence and Citations

Research from Princeton University's study on GEO found that content with citations and statistics saw visibility increases of up to 40% in AI-generated responses. AI models treat referenced claims as more reliable than unsupported assertions.

This means linking to primary sources, including specific data points, and referencing published research. If you claim that "the average UK website weighs 4MB," linking to the HTTP Archive data that supports that claim makes it more likely to be cited.

5. llms.txt and AI Context Files

A new generation of machine-readable files is emerging to help AI systems understand websites. llms.txt is a plain text file at your domain root that tells AI models how to interpret and represent your organisation. Think of it as robots.txt for AI comprehension rather than crawling.

While still an emerging standard, early adopters gain an advantage by providing explicit guidance to AI systems about their brand, expertise, and positioning — rather than leaving it to the model's interpretation.

6. Content Freshness and Date Signals

AI models have a strong preference for recent content. Pages with clear datePublished and dateModified signals in their schema markup, along with visible publication dates, are more likely to be cited for current topics. Undated content is treated as potentially stale.

The CLEAR Framework: A Systematic Approach

At OYNK, we developed the CLEAR framework specifically to address the gap between traditional SEO and AI search readiness. CLEAR stands for:

  • Clarity — Can AI understand what your page is about? This covers semantic HTML, heading hierarchy, and content structure.
  • Legibility — Can AI parse your structured data? This covers schema markup, internal linking, and semantic landmarks.
  • Evidence — Can AI verify your claims? This covers citations, author signals, date stamps, and external references.
  • Authority — Does AI trust your entity? This covers author schema, organisation schema, and reputation signals.
  • Resilience — Will your content survive algorithm changes? This covers technical foundations like accessibility, performance, and standards compliance.

Each pillar is scored, and the aggregate produces a CLEAR score from 0 to 100. We used this framework to audit our own site and took it from a D grade (68/100) to an A+ (100/100) in under 24 hours — a live case study in what structured optimisation can achieve.

Case Study: OYNK's Own D-to-A+ Journey

When we first ran a CLEAR audit on oynk.co.uk, the result was sobering. Despite having strong content and genuine expertise, our site scored 68/100 — a D grade. The reason was instructive.

Our site is a React single-page application (SPA). When the CLEAR audit crawler visited the site, it saw <div id="root"></div> — an empty shell. All our carefully crafted semantic HTML, schema markup, and heading structure was invisible because it required JavaScript execution to render.

The fix involved three categories of work:

  1. Server-side meta injection — We added a static HTML layer (via <noscript> blocks) containing semantic landmarks, headings, and structured content that any crawler can parse without JavaScript.
  2. Schema enrichment — We added WebPage, Article, Person (author), and Organization schemas to every route, with proper datePublished and dateModified values.
  3. AI context files — We created an llms.txt file, an /ai-context page, and ensured every page had the evidence signals that AI systems look for.

The result: 100/100, A+ grade. Every CLEAR pillar at maximum. The content didn't change — only the way we presented it to machines.

What This Means for Your Business

If your website was built in the last five years, it was almost certainly optimised for traditional search — keywords, backlinks, page speed, mobile responsiveness. Those signals still matter. But they're no longer sufficient.

AI search is additive, not replacement. You need traditional SEO foundations and AI readiness signals. The businesses that structure their content for both will appear in traditional search results and in AI-generated answers. The businesses that don't will gradually lose visibility as AI search captures more of the discovery funnel.

Five Steps You Can Take Today

  1. Add schema markup to every page — At minimum, add WebPage schema with datePublished, dateModified, author, and publisher. For articles, add Article schema. For services, add Service schema.
  2. Create an llms.txt file — A simple text file at /llms.txt that describes your organisation, your expertise, and your key pages. Our guide explains how.
  3. Structure content with clear headings — Use h1 for the main topic, h2 for major sections, h3 for subsections. Don't skip heading levels. Make each heading descriptive enough that it could stand alone as a summary.
  4. Include citations and data — Reference specific studies, link to primary sources, and include quantified claims where possible. AI models weight evidenced content higher than unsupported assertions.
  5. Add author and organisation information — Make sure your site has identifiable authors with credentials, an about page, and organisation schema that establishes your entity.

The Window Is Closing

AI search readiness is where mobile optimisation was in 2015 — obviously important, clearly the future, but not yet table stakes. The organisations that invest now will establish the structured data foundations, entity authority, and content architecture that AI systems reward. Those that wait will be playing catch-up against competitors who already own the AI citation space.

For a systematic assessment of where your website stands, our AI-Ready Web Design guide covers the full framework, and our CLEAR service provides a scored audit with prioritised recommendations.

AI search isn't replacing traditional search. It's adding a new layer of discovery where different rules apply. The question is whether your website is visible in both.

Ready to reduce your digital waste?

Book a free consultation to discuss how OYNK can help your organisation achieve its sustainability goals.

Book a Consultation