Groq’s $650M round anchors a busy AI cycle

Research5

Learning When to Translate for Multilingual Reasoning

Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the m

RESEARCH

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for jo

RESEARCH

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format in favor of more accurate unbiased quantized gradient estimation by stochastic rounding (SR), losin

RESEARCH

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. Existing replacement-based methods share two design constraints: full-layer granularity and contiguous selection. We argue that this is overly restrictive: in fact, redundancy in pretrained transformers is not confined to contiguous regions,

RESEARCH

Iterated Population Based Training with Task-Agnostic Restarts

Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization. Recent results indicate that the number of steps between HP updates is an important meta-HP of all PBT variants that can substantially af

RESEARCH

Funding1

View in Deal Flow →

Groq

One large infrastructure round dominates the funding desk, reinforcing investor appetite for compute and deployment layers. The absence of a broader spread of deals suggests capital is still concentrating where hardware and inference economics are clearest.

FUNDING

Talent Moves3

View in Talent Tracker →

head

Named moves remain concentrated around senior AI leadership and academic-to-industry shifts. The desk matters when a single departure or hire changes who controls research direction, not when teams merely reshuffle.

TALENT MOVES

AI chief

Named moves remain concentrated around senior AI leadership and academic-to-industry shifts. The desk matters when a single departure or hire changes who controls research direction, not when teams merely reshuffle.

TALENT MOVES

associate professor of computer science at UCLA

Named moves remain concentrated around senior AI leadership and academic-to-industry shifts. The desk matters when a single departure or hire changes who controls research direction, not when teams merely reshuffle.

TALENT MOVES

Benchmarks3

View in Model Tracker →

Llama 3 (70B) — gsm8k

Three benchmark entries for the same model point to a narrow but useful snapshot of performance across core reasoning and knowledge tests. The signal here is surface coverage: which evaluations are being tracked, and where a single system sits across them.

BENCHMARKS

Llama 3 (70B) — math

Three benchmark entries for the same model point to a narrow but useful snapshot of performance across core reasoning and knowledge tests. The signal here is surface coverage: which evaluations are being tracked, and where a single system sits across them.

BENCHMARKS

Llama 3 (70B) — mmlu

Three benchmark entries for the same model point to a narrow but useful snapshot of performance across core reasoning and knowledge tests. The signal here is surface coverage: which evaluations are being tracked, and where a single system sits across them.

BENCHMARKS