Steering is a widely used technique for controlling large language models, yet its effects are often unstable and hard to predict. Existing theoretical accounts are largely based on the Linear Representation Hypothesis (LRH). While LRH assumes that concepts can be orthogonalized for lossless control, this idealized mapping fails in real representations and cannot account for the observed unpredict
Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong gene
Biomedical abstracts play a critical role in downstream NLP applications, such as information retrieval, biocuration, and biomedical knowledge discovery. However, a non-trivial number of biomedical articles do not have abstracts, diminishing the utility of these articles for downstream tasks. We propose DPR-BAG (Divide, Prompt, and Refine for Biomedical Abstract Generation), a training-free, zero-
Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versus-attacker user settings. We formulate model extraction monitoring as benign-cal
Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order-statistic objectives. OrderGrad optimizes finite-sample L-statistics, i.e., weighted averages of so
Late-stage money dominates, with one outsized private round, one public-market debut, and another large growth financing. The signal is continued concentration of capital around infrastructure and frontier model platforms.
Late-stage money dominates, with one outsized private round, one public-market debut, and another large growth financing. The signal is continued concentration of capital around infrastructure and frontier model platforms.
Late-stage money dominates, with one outsized private round, one public-market debut, and another large growth financing. The signal is continued concentration of capital around infrastructure and frontier model platforms.
Named moves are sparse, but the cycle still shows leadership reshaping at major institutions. The mix of a founder-era role, a top AI executive move, and a central-bank promotion makes this a title-heavy desk.
Named moves are sparse, but the cycle still shows leadership reshaping at major institutions. The mix of a founder-era role, a top AI executive move, and a central-bank promotion makes this a title-heavy desk.
Named moves are sparse, but the cycle still shows leadership reshaping at major institutions. The mix of a founder-era role, a top AI executive move, and a central-bank promotion makes this a title-heavy desk.
Three fresh scores for one model create a narrow but useful snapshot rather than a leaderboard shake-up. The value here is the evaluation surface itself: a compact read on reasoning, math, and general knowledge performance.
Three fresh scores for one model create a narrow but useful snapshot rather than a leaderboard shake-up. The value here is the evaluation surface itself: a compact read on reasoning, math, and general knowledge performance.
Three fresh scores for one model create a narrow but useful snapshot rather than a leaderboard shake-up. The value here is the evaluation surface itself: a compact read on reasoning, math, and general knowledge performance.