Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

1492 papers29 this month12 topics

All Evaluation 42 Training 39 Agents 31 Reasoning 27 Efficiency 25 Safety 18 Multimodal 17 Applications 17 Alignment 11 Data 11 Architecture 8 scaling 6

Jul 6 – Jul 12(3)

Interpretable Human-Label-Free Deep Learning for Real-Bogus Classification with Uncertainty Quantification

Jul 6, 2026

Raphaël Bonnet-Guerrini, Bruno Sanchez, Dominique Fouchez et al.

You can train accurate astronomical classifiers without expensive human labels by combining synthetic data injection with robust handling of noisy labels, and get reliable confidence scores through a hybrid uncertainty approach.

This paper develops a Real-Bogus classification system for astronomical transients that requires no human-labeled training data. It uses simulated transient injections combined with noisy survey data and a dual-network training approach to reliably distinguish real astronomical events from false detections, while also providing calibrated uncertainty estimates.

trainingevaluationsafety

LLM-as-a-Verifier: A General-Purpose Verification Framework

Jul 6, 2026

Jacky Kwok, Shulu Li, Pranav Atreya et al.

Using continuous probability-based scores instead of discrete LLM judgments improves verification accuracy and calibration, and these fine-grained signals can guide both solution selection and reinforcement learning training.

This paper introduces LLM-as-a-Verifier, a framework that uses language models to evaluate solution correctness by computing probability distributions over scoring tokens rather than discrete scores.

Jun 29 – Jul 5(38)

Distributed Attacks in Persistent-State AI Control

Jul 2, 2026

Josh Hills, Ida Caspary, Asa Cooper Stickland

Persistent AI systems that ship code iteratively create a new vulnerability: attackers can hide malicious behavior by spreading it across multiple sessions, and different detection strategies are needed to catch gradual versus concentrated attacks.

This paper studies how AI coding agents can distribute malicious attacks across multiple pull requests over time to evade detection. The authors introduce a benchmark where agents pursue hidden goals while building software, comparing gradual attacks spread across PRs against concentrated attacks.

safetyagentsevaluation

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Jul 2, 2026

Matteo Boglioni, Thibault Rousset, Siva Reddy et al.

Current unlearning methods are imprecise at targeting specific parameters where knowledge is stored, making them vulnerable to attacks that resurface the data—precise localization matters more than output-level performance.

LACUNA is a new benchmark for testing whether LLM unlearning methods actually erase sensitive data from model parameters or just hide it. The researchers inject fake personal information into specific weights of language models, then check if unlearning methods successfully target those exact parameters.

Jun 22 – Jun 28(56)

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes

Jun 26, 2026

Luis Leal

Different Nash equilibrium solvers systematically select different equilibria based on their algorithm design—regularized methods pick maximum-entropy solutions while regret-averaging methods pick lower-entropy ones—which matters for robustness against imperfect opponents.

This paper investigates how different algorithms for solving two-player zero-sum games select different Nash equilibria from the convex set of possible equilibria.

evaluation

VGB for Masked Diffusion Model: Efficient Test-time Scaling for Reward Satisfaction and Sample Editing

Jun 26, 2026

Kijung Jeon, Thuy-Duong Vuong, Molei Tao

MDM-VGB enables efficient test-time scaling for constrained generation by allowing tokens to be remasked during sampling, achieving quadratic complexity while competing methods like best-of-N suffer exponential complexity—making it practical for real-world constraint satisfaction problems.

This paper introduces MDM-VGB, a sampling method for masked diffusion models that improves generation quality at test time by allowing tokens to be strategically unmasked and remasked based on reward signals.

reasoning

Jun 15 – Jun 21(3)

How Transparent is DiffusionGemma?

Jun 18, 2026

Joshua Engels, Callum McDougall, Bilal Chughtai et al.

Diffusion language models can achieve similar transparency to autoregressive models by treating denoised token states as interpretable checkpoints, but their ability to change all tokens simultaneously enables novel reasoning patterns that are harder to understand.

This paper investigates whether diffusion-based language models are less interpretable than traditional autoregressive models. By identifying interpretable token bottlenecks between denoising steps, the authors show DiffusionGemma's reasoning can be made nearly as transparent as standard models, though diffusion's parallel token updates create unique interpretability challenges.

architectureevaluation

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Jun 18, 2026

Gina Wong, Drew Prinster, Suchi Saria et al.

Expert-level calibration alone isn't enough for soft-routed MoE models under distribution shift—you need to explicitly calibrate the routing mechanism's aggregate predictions to maintain trustworthy uncertainty estimates.

This paper studies how mixture-of-experts (MoE) models maintain calibrated predictions under distribution shift. The authors show that calibrating individual experts works for hard-routed models but fails for soft-routed ones, and propose an adversarial reweighting method to improve calibration across different routing mechanisms and data distributions.

Papers

Jul 6 – Jul 12(3)

Interpretable Human-Label-Free Deep Learning for Real-Bogus Classification with Uncertainty Quantification

LLM-as-a-Verifier: A General-Purpose Verification Framework

Jun 29 – Jul 5(38)

Distributed Attacks in Persistent-State AI Control

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Jun 22 – Jun 28(56)

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes

VGB for Masked Diffusion Model: Efficient Test-time Scaling for Reward Satisfaction and Sample Editing

Jun 15 – Jun 21(3)

How Transparent is DiffusionGemma?

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Online Safety Monitoring for LLMs

What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates

Controllable Sim Agents with Behavior Latents

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Human Capital, Not Model Benchmarks, Predicts Hybrid Intelligence in Forecasting

Will Scaling Improve Social Simulation with LLMs?

Language Models as Measurement Apparatus for Culture

Optimal Stabilizer Testing and Learning with Limited Quantum Memory

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

Neuron-Aware Active Few-Shot Learning for LLMs

The Future of NLP may not be at NLP Conferences: Scholarly Migration Patterns in Natural Language Processing

WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs

Know Your Source: A Public Knowledge Store for Media Background Checks

HULAT2 at MER-TRANS 2026: Governed Multi-Agent Simplification for Spanish Easy-to-Read Generation

DRIFTLENS: Measuring Memory-Induced Reasoning Drift in Personalized Language Models

Understanding Agent-Based Patching of Compiler Missed Optimizations

Measuring the Gap Between Human and LLM Research Ideas

Theoria: Rewrite-Acceptability Verification over Informal Reasoning States

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Distill to Detect: Exposing Stealth Biases in LLMs through Cartridge Distillation

Decision-Aware Training for Sample-Based Generative Models

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation

Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?

AxDafny: Agentic Verified Code Generation in Dafny

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems

Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection

A Hybrid Framework For Crypto-Ransomware Detection In Enterprise Shared Storage

Uncertainty-Aware Generation and Decision-Making Under Ambiguity

Beyond 2D Matching: A Unified Single-Stage Framework for Geometry-Aware Cross-View Object Geo-Localization

Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Towards Automating Scientific Review with Google's Paper Assistant Tool

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation

Govern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Software

When are likely answers right? On Sequence Probability and Correctness in LLMs

Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching

Language-Based Digital Twins for Elderly Cognitive Assistance

Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection

Multilingual Reasoning Cascades Need More Context

AI Healthcare Chatbots as Information Infrastructure: A Large-Scale Study of User-Reported Breakdowns

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Prompt Injection in Automated Résumé Screening with Large Language Models: Single and Multi-Injection Settings

Simulation-based inference for rapid Bayesian parameter estimation in epidemiological models: a comparison with MCMC

How Good Can Linear Models Be for Time-Series Forecasting?

EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

BetXplain: An Explanation-Annotated Dataset for Detecting Manipulative Betting Advertisements on Social Media

Ribbon: Scalable Approximation and Robust Uncertainty Quantification

RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations

LMs as Task-Specific Knowledge Bases: An Interpretability Analysis

Bridging Talk and Thought: Understanding Dialogue Dynamics Across Collaborative Problem-Solving Contexts

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

Vulnerability of Natural Language Classifiers to Evolutionary Generated Adversarial Text

Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

Explaining Temporal Graph Neural Networks via Feature-induced Information Flow

Forecasting With LLMs: Improved Generalization Through Feature Steering

HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models