NVDA$1,847+3.2%MSFT$512+1.1%GOOGL$199-0.4%META$728+2.7%AMD$184-1.2%TSM$212+0.6%PLTR$98+4.1%AI IDX4,821+1.9%NVDA$1,847+3.2%MSFT$512+1.1%GOOGL$199-0.4%META$728+2.7%AMD$184-1.2%TSM$212+0.6%PLTR$98+4.1%AI IDX4,821+1.9%
PKT
SEED
Markets: OPENRefresh: Models tracked: Active deals: Regulatory actions: Sources:

Methodology.
Every number traces back.

The pipeline that produces the frontier — five ingestion + scoring stages, a five-tier verification ladder, and a per-axis manifesto that defines what counts.

Active manifestos
Composite benchmarks
Verification tiers
5
Deprecated benchmarks
PipelineFive stages from source poll → published frontier band.
01Ingest

200+ sources poll on schedule — HF leaderboards, vendor system cards, audited harnesses, prose papers.

Sources register on first detection; no manual onboarding.

02Canonicalize

Raw rows fold into model_benchmarks. Model aliases collapse onto a single canonical_entity; benchmark aliases collapse onto a canonical benchmark.

ON CONFLICT DO UPDATE: most-recent-wins at the DB layer.

03Score

Per-axis manifestos define the composite basket. axis_score is a reliability-weighted percentile across the basket — each benchmark scored by within-benchmark rank, shrunk toward the field median for thin samples. Two bases, toggleable everywhere: Authority (provenance-ranked pick, default) and Max (highest reported).

Eligibility: min benchmarks measured, recency window, harness filter.

04Verify

Each score carries a verification_level (independently verified › aggregator › vendor first-party › vendor cross-reference › unverified). A vendor re-citing a number it didn't produce ranks below first-party; agreement_stddev surfaces within-model disagreement.

Below-threshold confidence → review queue, not publication.

05Publish

Frontier band = top of the per-axis distribution. Tier counts, sibling axes, and the front-page hero render directly from the materialized view — no editorial overlay.

Methodology label + reference URL ride on every visible row.

Trust LadderHow every score carries its provenance.
IV
Tier
Independently Verified

Third-party harness ran the model. Replicable from the leaderboard's published prompts + scoring code.

e.g. HELM · LiveBench · SWE-Bench Verified · LMArena evals

AA
Tier
Aggregator Attributed

Aggregator surfaced the score — model card, leaderboard table, Papers with Code entry. Re-attests a primary source.

e.g. HuggingFace · Papers with Code · model registries

VA
Tier
Vendor Attributed

Vendor's own system card / blog / model card. Authoritative on the model but not independently replicated.

e.g. OpenAI system cards · Anthropic model cards · vendor blogs

CR
Tier
Vendor Cross-Reference

A vendor re-states a benchmark number it did not produce — re-citing a rival or an aggregator. DEMOTED below first-party attribution when a headline score is selected (mig 156).

e.g. launch-post comparison tables · a vendor blog quoting a competitor's eval

SA
Tier
Source Attributed

Unverified — surfaced through prose extraction (research paper, conference talk, technical brief) without re-run by an aggregator. The lowest tier.

e.g. arXiv abstracts · conference papers · community evals

Axis BasketsThe benchmarks each axis composites — weights ratified per quarter.
Loading manifestos…
Manifesto: founder-approved, agent-drafted; revisited per quarter.← State of the Frontier