Submit an agency
The method

How we measure

The register has one ruler, applied to every agency including our own. The gate decides who belongs in the segment; seven axes score them; measured axes are reproducible by any auditor and editorial axes are labelled as judgment. The point is not a single number — it is the per-axis profile, with the spread on view.

The gate — who enters the segment

An agency is listed only if all four hold. The gate cuts by tier, not by strength: a strong player that genuinely serves small business ranks where it earns, above us when it is better.

  • SMB-primary — serves private clients / small business as its primary segment, not enterprise-only with SMB as a token line.
  • Self-serve offer — has a transparent self-serve offer a buyer can purchase themselves — a published price or fixed package, no mandatory 'contact sales'.
  • GEO substance — names GEO / AI-visibility as an actual service (names answer engines, citation work, structured data, measures AI answers) — relabeled generic SEO is listed but flagged.
  • Live business — is a live, reachable business — real site, real contact.

The seven axes

Every axis must pass the neutral-buyer test: would a small-business buyer agree it matters, not knowing who scores well on it? An agency's own raw AI-visibility is mostly domain age plus anchoring — a confound — so it is weighted lightly. The heavy weight goes to age-independent, checkable signals.

AxisWhat it rewardsWeight
M1 · Own AI-visibility the agency is itself findable, and described correctly, in AI answers 9%
M2 · Method transparency publishes how it works and what it measures, not a black box 16%
M3 · Evidence verifiability named, specific, checkable proof over anonymous hype 23%
M4 · Pricing openness published price / range, not 'request a quote' 12%
M5 · AI reachability the agency's own site is reachable to AI crawlers — not blocked in robots.txt, server-rendered, ideally with an llms.txt 10%
E1 · Segment fit genuinely built for the private / small-business buyer 12%
E2 · Promise cleanliness no snake-oil — no guaranteed rankings, no fakery 18%

Measured axes (M1–M5) carry 70%; editorial axes (E1–E2) carry 30% — the reproducible core dominates by design, and judgment stays a labelled minority. M1 is the lightest (9%) because raw visibility is confounded by domain age; M3 is the heaviest (23%) as the most buyer-predictive, most anti-hype axis. M5 (10%) checks the agency's own site is reachable to AI crawlers — an AI-visibility shop that blocks them fails its own craft. Weights are v2 priors and move under calibration.

The red-flag cap

An egregious claim — guaranteed rankings or placements, fake reviews offered, 'we manipulate AI' — caps the composite regardless of other scores. Snake-oil cannot be out-weighted by a slick site; it ceilings the agency.

The focus penalty — a specialist who does everything isn't one

Some agencies that clear the gate are not GEO specialists but generalist combines — an SEO shop welded to a website factory, selling the whole stack at once with AI-visibility as one more item on the menu. We read that breadth as a trust signal pointing the wrong way: a 'specialist' who takes on everything is rarely deep in any one part, and GEO is the part most often bolted on as an afterthought. So the composite carries a fixed focus penalty, applied after the red-flag cap and shown on every entry.

  • Generalist combine — SEO + website production + everything bundled: −0.5
  • SEO-led hybrid — SEO-first, but with real, named GEO work: −0.2
  • Focused GEO / AI-visibility specialist: no penalty

This is a focus modifier, not a change to the axis weights — the ruler above is unchanged. The classification is observable from the agency's own offer and applied uniformly; UserSignals is a focused specialist and carries no penalty, by the same rule.

Reproducible by design

Measured axes carry the evidence snippet and source they came from; a missing finding is recorded as an explicit 'not found', never a guessed number. Every score is stamped 'data as of' a date, and the probe set (ChatGPT + Claude, search-grounded, clean session) is fixed so any external auditor can re-run it. Editorial axes and the top of the list go through human review before publish.

What this ruler does NOT measure Track-record depth and the volume of named cases favour older incumbents we cannot fully see; NDA-bound outcomes are unscored. We weight reproducible, buyer-checkable signal on purpose — and we list ourselves on the same ruler, shown exactly where we stand.