Briefings Dashboard Deal Flow Model Tracker AI Tools Pricing Policy Monitor Talent Tracker Research Robotics Data API

SEED

│Markets: OPEN│Refresh: │Models tracked: …│Active deals: …│Regulatory actions: …│Sources: …

Daily Briefing · Monday, June 1, 2026

Reasoning evals and agentic RL dominate a quiet cycle

11 items · 3 desks · 6 min read

Research5

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Reinforcement learning (RL) shows promise for enhancing LLM agentic reasoning, yet sparse terminal rewards hinder fine-grained optimization. Process reward modeling offers an alternative but incurs high computational costs, reward hacking risks, and annotation bottlenecks. We introduce RewardFlow, a lightweight method for estimating state-level rewards in agentic reasoning. By constructing state g

RESEARCH

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

The known stylistic biases in LLM judges, such as a preference for verbosity or specific sentence structures, present an underexplored security vulnerability. In this work, we introduce BITE (BIas exploraTion and Exploitation), a black-box adversarial framework that learns semantics-preserving edits to mislead an LLM judge and artificially inflate the scores it assigns. We cast the selection of st

RESEARCH

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. Existing metrics rely on final-answer accuracy or surface-level statistics, leaving the reasoning process itself unexamined. We introduce TRACE (Toulmin-based Reasoning Assessment through Constructive Elements), a metric that analyzes Chain-of-Thought (CoT) reasoning processes. R

RESEARCH

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

Multi-step reasoning remains a central challenge for large language models: single-pass generation is efficient but lacks accuracy; tree-search methods explore multiple paths but are computation-heavy. We address this gap by distilling reasoning progress into a hyperbolic geometric signal that guides step-by-step generation. Our approach is motivated by a structural observation: in combinatorial r

RESEARCH

When Models Learn to Ask Why: Adaptive Causal Reasoning for Trustworthy Medical Vision-Language Models

Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious correlations and limiting their clinical reliability. We pinpoint three core challenges in medical CoT reason

RESEARCH

Talent Moves3

View in Talent Tracker →

founder

Named moves are sparse, but the desk still signals team formation and expansion rather than churn. The items point to new structures taking shape around founders and a fresh lead hire.

TALENT MOVES

founder

Named moves are sparse, but the desk still signals team formation and expansion rather than churn. The items point to new structures taking shape around founders and a fresh lead hire.

TALENT MOVES

lead

Named moves are sparse, but the desk still signals team formation and expansion rather than churn. The items point to new structures taking shape around founders and a fresh lead hire.

TALENT MOVES

Benchmarks3

View in Model Tracker →

Reasoning evals and agentic RL dominate a quiet cycle

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

When Models Learn to Ask Why: Adaptive Causal Reasoning for Trustworthy Medical Vision-Language Models

founder

founder

lead

Llama 3 (70B) — gsm8k

Llama 3 (70B) — math

Llama 3 (70B) — mmlu