Page Discovery Methodology

How AEO Butler selects pages for auto-discovered Snapshot audits — the algorithm, its confidence levels, its known limitations, and what to do when it does not work well for your site.

What Auto-Discovery Does

When you order a Snapshot Codex with Option A (auto-discover), our pipeline visits your homepage, extracts every navigable link, scores each link for commercial agent relevance, and selects the five highest-scoring pages. Your homepage is always included as the first page.

The goal is to select the pages an AI agent is most likely to navigate when evaluating or transacting with your business — product or service pages, pricing pages, checkout flows, and contact pages. The algorithm deliberately avoids blog posts, legal pages, press releases, and support documentation, because these are rarely the pages where agent-readiness failures have commercial consequences.

Auto-discovery runs from outside your site, the same way an agent would encounter it. It does not use your sitemap, your analytics data, or any privileged knowledge of your site structure. It reads what is publicly visible on your homepage.

The Homepage Is Always Page One

Regardless of discovery method, your homepage is always the first page audited. It is the entry point for most agent navigation sessions and the most likely place for foundational structural issues — missing landmark regions, absent structured data, broken heading hierarchy — that propagate across the entire site.

The Scoring Algorithm

Every candidate link found on your homepage is scored before selection. The score is the sum of all matching pattern weights. Higher score means higher priority for selection.

Positive Score Patterns

Patterns that increase a URL's selection priority
URL contains Score boost Reason
product, service, shop, store, pricing, plan, buy, order, catalog+3Directly transactional — highest agent relevance
about, contact, checkout, cart, solutions, features+2High navigational value for agent commerce tasks
how-it, get-started+1Moderate relevance for agent evaluation tasks
Link found inside a nav, header, or role=navigation element+1 bonusSite owner explicitly marked this as navigation
Path depth of 0 (root-level page)+2Direct children of root are usually primary pages
Path depth of 1+1One level deep is usually a category or section page

Negative Score Patterns

Patterns that decrease a URL's selection priority
URL contains Score penalty Reason
blog, news, press, article, post/, date-based paths-3Editorial content — low agent commerce relevance
legal, privacy, terms, cookie, sitemap, careers, jobs-3Compliance and administrative content
login, signin, register, account-2Authentication-gated — agent cannot access without credentials
support, help-2Support content rarely affects purchase path
faq-1Low priority unless no transactional pages exist
Path depth of 3 or more-1Deep paths are usually detail pages, not primary audit targets

Final Selection

After scoring, candidates are ranked highest to lowest. The homepage is placed first. The next four highest-scoring pages fill the remaining slots. If fewer than four scoreable candidates are found, the audit proceeds with however many pages were discovered — which is reported transparently in the audit output.

Links with a net score of zero or below are excluded from consideration entirely when body-link fallback is used. Navigation-element links with zero score are still considered since the site owner explicitly designated them as navigation.

Confidence Levels

Every auto-discovered audit includes a confidence level in the report — High, Medium, or Low. This tells you how reliable the page selection is likely to be.

High Confidence

Three or more links found in semantic navigation elements. No navigation warnings triggered. The site uses standard HTML navigation with real href attributes. The selected pages are very likely to be the most commercially relevant pages on the site.

Most standard e-commerce sites, service business sites, and SaaS product pages receive High confidence.

Medium Confidence

At least one semantic navigation link found but with one minor warning. The page selection is likely reasonable but may not perfectly represent your most important pages. The audit findings are still valid — they reflect real issues on the pages that were audited.

Low Confidence

No semantic navigation links found, very few total links, or significant JavaScript navigation detected. The selected pages may not be the most commercially important pages on your site. The audit findings are real and valid, but you may want to request a re-audit using Option B to specify better pages.

Sites with Low confidence receive a specific warning in their report and are eligible for a free page re-selection and re-audit within 14 days of delivery. See the Recourse section below.

Known Limitations of Auto-Discovery

Auto-discovery works well for the majority of commercial websites. It does not work well — and we say this explicitly — for the following site types. If your site falls into any of these categories, Option B (specify your pages) will produce a better audit.

Single-Page Applications and JavaScript-Rendered Navigation

Sites built with React, Vue, Angular, or Next.js where navigation links are rendered by JavaScript after page load may not expose their full link structure during discovery. Our browser does wait for network idle before extracting links, which captures most JavaScript-rendered content, but some frameworks delay navigation rendering until user interaction. If your site's navigation only appears after clicking a menu button or scrolling, auto-discovery may miss key pages.

Examples: Complex SPAs, sites with mega-menus that open on hover, sites with hamburger menus on desktop.

Large Enterprise Sites with Non-Standard Navigation

Large enterprise sites — telecommunications, banking, healthcare, insurance — often use navigation architectures that do not follow standard semantic HTML patterns. Navigation may be built from custom components, rendered through a CMS with non-standard markup, or structured around internal tools rather than public-facing pages.

Examples: AT&T, Verizon, large bank websites, insurance provider portals.

Authentication-Gated Sites

If the pages most important to your business require a logged-in session to access, auto-discovery cannot reach them. The pipeline runs as an unauthenticated visitor. Pages behind login, account-specific pricing pages, and B2B portal pages are inaccessible to discovery.

Note: The audit itself also runs as an unauthenticated visitor. If your target pages require login, the audit cannot evaluate them regardless of which discovery option you choose. For sites where the most important agent interactions happen behind authentication, contact us to discuss options.

Sites Split Across Subdomains

Discovery only follows links on the same root domain. If your homepage links to shop.yourdomain.com, docs.yourdomain.com, or app.yourdomain.com, those pages will not be discovered automatically and will not be included in the audit. Subdomain pages can be specified manually using Option B.

Sites with Redirect-Heavy Navigation

Some sites use marketing tracking redirects on navigation links — clicking a nav link goes through a redirect service before landing on the destination. These redirect URLs will be excluded from discovery because their domain does not match your site's domain. The underlying destination pages can be specified manually using Option B.

Scope Limitations — What the Audit Does Not Cover

This section states clearly what the Snapshot Codex audit does not measure. These are not deficiencies — they are the defined scope of the product. Understanding what is out of scope helps you interpret the report accurately.

Pages Not in Scope

The Snapshot Codex audits five pages. Your site may have hundreds or thousands. Findings on the five audited pages do not imply anything about pages that were not audited. A score of 80/100 on the audited pages does not mean your entire site scores 80/100. The Full Codex allows you to specify up to twenty pages if broader coverage is needed.

Dynamic States Not Captured

The Snapshot Codex preview (the free Try It) audits the initial render state only. The full Snapshot Codex audits four rendering states — initial, post-scroll, post-interaction, and mobile viewport. However, there are states we do not capture: pages after adding items to a cart, pages after form submission, pages after account creation, and pages after any multi-step workflow that requires persistent state. If critical agent tasks on your site happen in these post-workflow states, the audit will not measure them.

Specific Agent Framework Behavior

The audit measures the structural properties of your site — Accessibility Tree quality, structured data completeness, semantic HTML correctness. It does not measure how any specific agent framework — OpenAI Operator, Anthropic computer use, LangChain, or others — would actually behave on your site. Different agent frameworks interpret the same page differently. The audit findings represent issues that would impair agent navigation across most frameworks, but we do not claim to predict the behavior of any specific framework with certainty.

Revenue Impact

AEO Butler does not claim that implementing audit recommendations will produce any specific revenue increase, increase in agent-originated traffic, or improvement in any business metric. We identify structural issues that impair agent navigation. The commercial consequence of those issues depends on your site's traffic, your customers' use of AI agents, and many other factors outside our knowledge or control. The audit is an engineering diagnostic, not a business performance guarantee.

Point-in-Time Snapshot

The audit reflects the state of your site at the moment it was run. Findings may not apply if your site is updated after the audit. Fixes implemented after delivery are not reflected in the original report score — that is what the Quarterly Rescan measures.

Customer Recourse

We stand behind our methodology and are transparent about its limitations. The following remedies are available to every customer.

Free Page Re-Selection for Low Confidence Results

If your Snapshot Codex report includes a Low confidence discovery warning, you are eligible for a free page re-selection and re-audit within 14 days of delivery. Reply to your delivery email with up to five URLs you would like audited instead. We will re-run the pipeline on your specified pages and deliver a new report at no charge.

This remedy applies specifically to Low confidence auto-discovery results. It is not available for Medium or High confidence results where the discovery worked as designed but you would prefer different pages.

Disputing Individual Findings

If you believe a specific finding is incorrect — that the reported issue does not exist on your site — reply to the delivery email with the finding type, the page URL, and a brief description of why you believe it is wrong. We will review the raw audit data for that finding and respond within two business days. If the finding is a confirmed false positive, we will issue a corrected report with the finding removed and the score adjusted.

Refunds

Refunds are available within 24 hours of delivery if the report could not be generated due to a technical failure on our end — the pipeline failed, the report is empty, or the delivery was not received. Refunds are not available after a completed report has been delivered, including cases where you disagree with the page selection or the findings. The Low confidence free re-audit remedy is the appropriate path for page selection disputes.

Contact

For any issue not covered by the above, email us directly. We are a small operation and every email is read and responded to personally, usually within one business day.

When to Use Option B Instead

Use Option B — specify your pages — in any of these situations:

Option B is available on all Snapshot Codex and Full Codex orders at the same price. There is no premium for specifying pages. Return to pricing to order.

Methodology Version

Document version
1.0
Effective date
Pipeline version
AEO Butler v1.0 — page_discovery.py
Last reviewed
Changes from previous version
Initial publication.

This document is updated when the discovery algorithm changes. The version number in your audit report corresponds to the version of this document in effect at the time the audit ran.