How it works

Our AI visibility methodology.

Manual where it matters, automated where it scales. Every audit follows the same six steps so findings are comparable across categories and over time.

AI visibility — whether AI answer engines mention, recommend, omit, or misrepresent your brand for buyer-intent prompts, and the public evidence that causes them to do so.

Classic search optimization asks “where do we rank for this keyword?” AI search answers a different question: when a buyer asks an engine to recommend tools for a job, what does it say — and why? Our methodology is built to answer that with evidence, not vibes. We do not optimize for a hidden ranking we can’t see. We measure what engines actually return, trace it to the sources and pages behind it, and hand you a roadmap to improve that evidence.

The method is deliberately bounded and repeatable. A locked prompt set, named competitors, named engines, human labelling, and a fixed retest cadence mean two runs of the same audit can be compared honestly. That’s what makes a delta meaningful.

The method

Six steps, every audit.

Same sequence, same artifacts, every time. Below the overview, each step is broken down in detail.

  1. 1
    Define buyer-intent prompts
    10–15 prompts for a Snapshot, or 25–50 prompts for Monitoring, mapped to persona, funnel stage, and category — not vanity queries.
  2. 2
    Select competitors & surfaces
    3–5 named competitors across 3–5 AI/search surfaces (ChatGPT, Perplexity, Gemini, Claude, Google AI Mode).
  3. 3
    Capture answers & citations
    Timestamped captures, screenshots, source URLs, and engine metadata preserved for every prompt.
  4. 4
    Score mentions & misrepresentation
    Recommendation, mention, omission, misrepresentation — labelled by a human reviewer, not a heuristic.
  5. 5
    Map source & content gaps
    Which third-party and owned pages drive citations, and which gaps your competitors are filling.
  6. 6
    Roadmap & retest plan
    A sequenced fix backlog with impact/effort, plus a retest plan to measure visibility gained.
Step by step

What actually happens in each step.

Step 01
Define buyer-intent prompts

10–15 prompts for a Snapshot, or 25–50 prompts for Monitoring, mapped to persona, funnel stage, and category — not vanity queries.

We interview your team and study the category to write prompts the way buyers ask them — mapped to persona and a fixed awareness/consideration/decision mix. Vanity queries are excluded.

Step 02
Select competitors & surfaces

3–5 named competitors across 3–5 AI/search surfaces (ChatGPT, Perplexity, Gemini, Claude, Google AI Mode).

Competitors are named explicitly (no “the market” hand-waving), and engines are chosen by where your category's buyers actually ask. Both are recorded in the report.

Step 03
Capture answers & citations

Timestamped captures, screenshots, source URLs, and engine metadata preserved for every prompt.

Every prompt run is captured with a screenshot, source URLs, engine metadata, and a timestamp. Because AI answers are non-deterministic, we rely on patterns across the full set, not a single response.

Step 04
Score mentions & misrepresentation

Recommendation, mention, omission, misrepresentation — labelled by a human reviewer, not a heuristic.

A human reviewer labels each capture: recommended, mentioned, omitted, or misrepresented. Edge cases are documented, not silently scored.

Step 05
Map source & content gaps

Which third-party and owned pages drive citations, and which gaps your competitors are filling.

We map which third-party and owned pages drive citations, and which sources competitors own that you're absent from — the gaps that explain most omissions.

Step 06
Roadmap & retest plan

A sequenced fix backlog with impact/effort, plus a retest plan to measure visibility gained.

Findings become a backlog sequenced by impact and effort, each fix tied to the prompts it should move, plus a 30/90-day retest plan to measure visibility gained.

How scoring works

Four labels. Numerator over denominator.

Every captured answer gets exactly one label. Metrics are always reported as a count over the prompt-set size (e.g. 12 / 40), never as a bare percentage.

Recommended

The engine actively suggests your brand as a choice for the prompt (“you should look at …”, or it appears in a ranked shortlist). The strongest, most defensible signal.

Mentioned

Your brand is named without a recommendation — listed in passing, used as an example, or referenced in a citation. Visible, but not yet a buying signal.

Omitted

Your brand is absent from a relevant answer where direct competitors appear. The default failure mode, and the largest single category in most first audits.

Misrepresented

Your brand is present but the claim is inaccurate or stale (wrong price, missing integration, wrong audience). The highest-cost error because the buyer trusts a confident, wrong answer.

The headline metrics

  • Recommendation share — prompts where you’re recommended ÷ total prompts (e.g. 12 / 40).
  • Mention share — prompts where you’re recommended or mentioned ÷ total prompts.
  • Citation share — prompts where an answer cites one of your URLs ÷ total prompts.
  • Omission rate — relevant prompts where you’re absent while competitors appear ÷ total prompts.
  • Misrepresentation count — raw count of inaccurate appearances, always called out separately because one is too many at the decision stage.

Because a single prompt can recommend several vendors, competitor shares need not sum to the prompt count. We state the denominator on every chart so the math is always checkable.

Why this matters

AI answer engines are now part of the buyer’s shortlist.
If the evidence is wrong or missing, the recommendation goes to someone else.

58%
of B2B buyers now consult an AI answer engine before a vendor shortlist [1]
3.4×
more comparison queries handled by AI vs traditional search YoY [2]
71%
of category-defining prompts return the same 2–3 vendors across engines [3]

[1–3] Aggregated industry research, 2025–2026. Full sources listed below.

Sources

References & further reading.

The stat band above aggregates industry research; figures are directional and cited so you can check them. The fundamentals we build on come from Google’s own guidance for AI search experiences.

  1. [1]Buyer reliance on AI answer engines before vendor shortlisting — aggregated from 2025–2026 B2B buying-behavior research (directional). Figures are illustrative of a category-wide shift, not a single study.
  2. [2]Growth in AI-handled comparison queries vs traditional search year over year — aggregated industry analyses, 2025–2026 (directional).
  3. [3]Concentration of category-defining prompts on a small set of vendors across engines — aggregated AI-search visibility research, 2025–2026 (directional).
  4. [4]Google Search Central — guidance on AI features in Search and the fundamentals that still apply: useful, reliable, people-first content; technical accessibility and crawlability; and accurate structured data. See developers.google.com/search/docs/appearance/ai-features and the Search Essentials.

We do not present aggregated figures as precise measurements of your category. Your audit’s numbers come only from your own captured runs — see a full sample report.

FAQ

Methodology questions.

  • It's AI visibility — measuring whether AI answer engines recommend, mention, omit, or misrepresent you for buyer-intent prompts. SEO fundamentals (crawlable pages, useful content, accurate structured data) still apply; we layer AI-specific evidence on top.

  • The prompt set is locked before testing and mapped to persona and funnel stage, with a fixed awareness/consideration/decision mix. We reuse the identical set on every retest so results are comparable over time, not curated.

  • A human reviewer, not a heuristic. Each captured answer is read in context and labelled. Edge cases (a brand named only to dismiss it, for example) are documented in the report rather than silently scored.

  • Every finding records the engine, the prompt ID, and a capture date, with a screenshot. AI answers are non-deterministic, so we note that explicitly and rely on patterns across the full set rather than any single response.

  • No, and you should be skeptical of anyone who does. We measure visibility, identify gaps, and help you improve the public evidence engines can understand. That's the only honest service.

See the methodology applied.

The sample report walks the full output of these six steps on a fictional category, end to end.