How we measure
The register has one ruler, applied to every agency including our own. The gate decides who belongs in the segment; seven axes score them; measured axes are reproducible by any auditor and editorial axes are labelled as judgment. The point is not a single number — it is the per-axis profile, with the spread on view.
The gate — who enters the segment
An agency is listed only if all four hold. The gate cuts by tier, not by strength: a strong player that genuinely serves small business ranks where it earns, above us when it is better.
- SMB-primary — serves private clients / small business as its primary segment, not enterprise-only with SMB as a token line.
- Self-serve offer — has a transparent self-serve offer a buyer can purchase themselves — a published price or fixed package, no mandatory 'contact sales'.
- GEO substance — names GEO / AI-visibility as an actual service (names answer engines, citation work, structured data, measures AI answers) — relabeled generic SEO is listed but flagged.
- Live business — is a live, reachable business — real site, real contact.
The seven axes
Every axis must pass the neutral-buyer test: would a small-business buyer agree it matters, not knowing who scores well on it? An agency's own raw AI-visibility is mostly domain age plus anchoring — a confound — so it is weighted lightly. The heavy weight goes to age-independent, checkable signals.
| Axis | What it rewards | Weight |
|---|---|---|
| M1 · Own AI-visibility | the agency is itself findable, and described correctly, in AI answers | 9% |
| M2 · Method transparency | publishes how it works and what it measures, not a black box | 16% |
| M3 · Evidence verifiability | named, specific, checkable proof over anonymous hype | 23% |
| M4 · Pricing openness | published price / range, not 'request a quote' | 12% |
| M5 · AI reachability | the agency's own site is reachable to AI crawlers — not blocked in robots.txt, server-rendered, ideally with an llms.txt | 10% |
| E1 · Segment fit | genuinely built for the private / small-business buyer | 12% |
| E2 · Promise cleanliness | no snake-oil — no guaranteed rankings, no fakery | 18% |
Measured axes (M1–M5) carry 70%; editorial axes (E1–E2) carry 30% — the reproducible core dominates by design, and judgment stays a labelled minority. M1 is the lightest (9%) because raw visibility is confounded by domain age; M3 is the heaviest (23%) as the most buyer-predictive, most anti-hype axis. M5 (10%) checks the agency's own site is reachable to AI crawlers — an AI-visibility shop that blocks them fails its own craft. Weights are v2 priors and move under calibration.
The red-flag cap
An egregious claim — guaranteed rankings or placements, fake reviews offered, 'we manipulate AI' — caps the composite regardless of other scores. Snake-oil cannot be out-weighted by a slick site; it ceilings the agency.
The focus penalty — a specialist who does everything isn't one
Some agencies that clear the gate are not GEO specialists but generalist combines — an SEO shop welded to a website factory, selling the whole stack at once with AI-visibility as one more item on the menu. We read that breadth as a trust signal pointing the wrong way: a 'specialist' who takes on everything is rarely deep in any one part, and GEO is the part most often bolted on as an afterthought. So the composite carries a fixed focus penalty, applied after the red-flag cap and shown on every entry.
- Generalist combine — SEO + website production + everything bundled: −0.5
- SEO-led hybrid — SEO-first, but with real, named GEO work: −0.2
- Focused GEO / AI-visibility specialist: no penalty
This is a focus modifier, not a change to the axis weights — the ruler above is unchanged. The classification is observable from the agency's own offer and applied uniformly; UserSignals is a focused specialist and carries no penalty, by the same rule.
Reproducible by design
Measured axes carry the evidence snippet and source they came from; a missing finding is recorded as an explicit 'not found', never a guessed number. Every score is stamped 'data as of' a date, and the probe set (ChatGPT + Claude, search-grounded, clean session) is fixed so any external auditor can re-run it. Editorial axes and the top of the list go through human review before publish.