Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

861 papers100 this month12 topics

All Efficiency 37 Reasoning 36 Training 35 Evaluation 29 Architecture 23 Agents 23 Multimodal 17 Applications 15 Alignment 9 Safety 8 scaling 8 Data 3

May 18 – May 24(45)

Tokenisation via Convex Relaxations

May 21, 2026

Jan Tempus, Philip Whittington, Craig W. Schmidt et al.

ConvexTok uses convex optimization to build tokenizers that are provably near-optimal (within 1% at typical vocabulary sizes) and compress text better than greedy algorithms like BPE, with measurable improvements in language model efficiency.

This paper replaces greedy tokenization algorithms like BPE with a convex optimization approach called ConvexTok. Instead of making locally optimal choices, it formulates tokenizer construction as a linear program, achieving better compression (bits-per-byte) and allowing users to verify how close their tokenizer is to mathematically optimal.

trainingefficiency

Integrable Elasticity via Neural Demand Potentials

May 21, 2026

Carlos Heredia, Daniel Roncel

Neural demand models can be designed to respect economic constraints (integrability), producing more reliable price-elasticity estimates that are both mathematically consistent and practically useful for retail pricing.

This paper introduces ICDN, a neural network model that learns demand patterns for multiple products based on prices. Unlike traditional approaches, it directly models how demand changes with price (elasticity) in a mathematically consistent way, making the learned relationships more economically realistic and stable.

May 11 – May 17(30)

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

May 14, 2026

Ziyu Guo, Rain Liu, Xinyan Chen et al.

A single discrete token can serve dual purposes—executing visual operations like code while also functioning as a learnable reasoning unit—making visual reasoning more efficient and trainable without architectural changes.

ATLAS introduces a single 'functional token' that acts as both an agentic operation and a latent visual reasoning unit, enabling models to reason about images without generating intermediate visual content. This approach combines the interpretability of code-based reasoning with the efficiency of latent reasoning, while remaining compatible with standard language model training.

reasoningmultimodalagents

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

May 14, 2026

Ruozhen He, Meng Wei, Ziyan Yang et al.

Maintaining consistent characters and objects across long video sequences is hard; explicit memory of each entity's appearance significantly improves consistency, especially when characters reappear after many shots.

EntityBench is a benchmark for evaluating multi-shot video generation—creating coherent video sequences with multiple scenes. It includes 140 episodes with detailed tracking of characters, objects, and locations across shots, plus an evaluation system that measures both video quality and consistency.

May 4 – May 10(25)

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

May 8, 2026

Tong Zheng, Haolin Liu, Chengsong Huang et al.

You can automatically discover better inference strategies for LLMs by treating it as a search problem over execution traces, rather than manually designing heuristics—and it's cheap to do at scale.

This paper presents AutoTTS, a framework that automatically discovers test-time scaling strategies for LLMs instead of relying on hand-crafted heuristics.

reasoning

Normalizing Trajectory Models

May 8, 2026

Jiatao Gu, Tianrong Chen, Ying Shen et al.

NTM enables fast image generation (4 steps) while preserving exact likelihood calculation—something previous fast diffusion methods couldn't do—by using normalizing flows for each denoising step instead of simple Gaussian assumptions.

This paper introduces Normalizing Trajectory Models (NTM), a new approach for fast image generation that compresses diffusion sampling from many steps to just four. Unlike existing fast methods that lose the ability to calculate exact probabilities, NTM maintains a mathematically exact likelihood while generating high-quality images, making it useful for both generation and evaluation.

efficiencyarchitecturetraining

Papers

May 18 – May 24(45)

Tokenisation via Convex Relaxations

Integrable Elasticity via Neural Demand Potentials

May 11 – May 17(30)

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

May 4 – May 10(25)

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Normalizing Trajectory Models

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Evaluating Commercial AI Chatbots as News Intermediaries

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation

Variance Reduction for Expectations with Diffusion Teachers

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

EvoStruct: Bridging Evolutionary and Structural Priors for Antibody CDR Design via Protein Language Model Adaptation

Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Leveraging LLMs for Grammar Adaptation: A Study on Metamodel-Grammar Co-Evolution

Mem-$π$: Adaptive Memory through Learning When and What to Generate

HITL-D: Human In The Loop Diffusion Assisted Shared Control

Mitigating Label Bias with Interpretable Rubric Embeddings

Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability

Code as Agent Harness

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

SURGE: Approximation-free Training Free Particle Filter for Diffusion Surrogate

Actionable World Representation

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

PIXLRelight: Controllable Relighting via Intrinsic Conditioning

Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

General Preference Reinforcement Learning

Semantic Generative Tuning for Unified Multimodal Models

Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

FutureSim: Replaying World Events to Evaluate Adaptive Agents

Quantitative Video World Model Evaluation for Geometric-Consistency

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Evidential Reasoning Advances Interpretable Real-World Disease Screening

Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands

Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction

MeMo: Memory as a Model

Self-Distilled Agentic Reinforcement Learning

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Elastic Attention Cores for Scalable Vision Transformers

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Learning, Fast and Slow: Towards LLMs That Adapt Continually

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

MEME: Multi-entity & Evolving Memory Evaluation

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

Solve the Loop: Attractor Models for Language and Reasoning

Search Your Block Floating Point Scales!

A proximal gradient algorithm for composite log-concave sampling