Briefings Dashboard Deal Flow Model Tracker AI Tools Pricing Policy Monitor Talent Tracker Research Robotics Data API

SEED

│Markets: OPEN│Refresh: │Models tracked: …│Active deals: …│Regulatory actions: …│Sources: …

Methodology.
Every number traces back.

The pipeline that produces the frontier — five ingestion + scoring stages, a five-tier verification ladder, and a per-axis manifesto that defines what counts.

Active manifestos

—

Composite benchmarks

—

Verification tiers

5

Deprecated benchmarks

—

PipelineFive stages from source poll → published frontier band.

01Ingest

200+ sources poll on schedule — HF leaderboards, vendor system cards, audited harnesses, prose papers.

Sources register on first detection; no manual onboarding.

02Canonicalize

Raw rows fold into model_benchmarks. Model aliases collapse onto a single canonical_entity; benchmark aliases collapse onto a canonical benchmark.

ON CONFLICT DO UPDATE: most-recent-wins at the DB layer.

03Score

Per-axis manifestos define the composite basket. axis_score is a reliability-weighted percentile across the basket — each benchmark scored by within-benchmark rank, shrunk toward the field median for thin samples. Two bases, toggleable everywhere: Authority (provenance-ranked pick, default) and Max (highest reported).

Eligibility: min benchmarks measured, recency window, harness filter.

04Verify

Each score carries a verification_level (independently verified › aggregator › vendor first-party › vendor cross-reference › unverified). A vendor re-citing a number it didn't produce ranks below first-party; agreement_stddev surfaces within-model disagreement.

Below-threshold confidence → review queue, not publication.

05Publish

Frontier band = top of the per-axis distribution. Tier counts, sibling axes, and the front-page hero render directly from the materialized view — no editorial overlay.

Methodology label + reference URL ride on every visible row.

Trust LadderHow every score carries its provenance.

IV

Tier

Independently Verified

Third-party harness ran the model. Replicable from the leaderboard's published prompts + scoring code.

e.g. HELM · LiveBench · SWE-Bench Verified · LMArena evals

AA

Tier

Aggregator Attributed

Aggregator surfaced the score — model card, leaderboard table, Papers with Code entry. Re-attests a primary source.

e.g. HuggingFace · Papers with Code · model registries

VA

Tier

Vendor Attributed

Vendor's own system card / blog / model card. Authoritative on the model but not independently replicated.

e.g. OpenAI system cards · Anthropic model cards · vendor blogs

CR

Tier

Vendor Cross-Reference

A vendor re-states a benchmark number it did not produce — re-citing a rival or an aggregator. DEMOTED below first-party attribution when a headline score is selected (mig 156).

e.g. launch-post comparison tables · a vendor blog quoting a competitor's eval

SA

Tier

Source Attributed

Unverified — surfaced through prose extraction (research paper, conference talk, technical brief) without re-run by an aggregator. The lowest tier.

e.g. arXiv abstracts · conference papers · community evals

Axis BasketsThe benchmarks each axis composites — weights ratified per quarter.

Loading manifestos…

Manifesto: founder-approved, agent-drafted; revisited per quarter.← State of the Frontier