Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

861 papers57 this month12 topics

All Efficiency 37 Reasoning 36 Training 35 Evaluation 29 Architecture 23 Agents 23 Multimodal 17 Applications 15 Alignment 9 Safety 8 scaling 8 Data 3

May 18 – May 24(17)

Tokenisation via Convex Relaxations

May 21, 2026

Jan Tempus, Philip Whittington, Craig W. Schmidt et al.

ConvexTok uses convex optimization to build tokenizers that are provably near-optimal (within 1% at typical vocabulary sizes) and compress text better than greedy algorithms like BPE, with measurable improvements in language model efficiency.

This paper replaces greedy tokenization algorithms like BPE with a convex optimization approach called ConvexTok. Instead of making locally optimal choices, it formulates tokenizer construction as a linear program, achieving better compression (bits-per-byte) and allowing users to verify how close their tokenizer is to mathematically optimal.

trainingefficiency

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

May 21, 2026

Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld et al.

Training LLMs to produce diverse outputs across multiple reward dimensions—not just maximizing a single score—makes them better at test-time search where you can pick the best solution from many candidates.

This paper introduces Vector Policy Optimization (VPO), a training method that teaches language models to generate diverse solutions by optimizing for multiple reward objectives simultaneously, rather than a single scalar reward.

training

May 11 – May 17(10)

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

May 14, 2026

Xiang Fan, Yuheng Wang, Bohan Fang et al.

Video generation systems lose detail because their decoders ignore the input image—adding reference conditioning to the decoder recovers this information and improves quality by up to 2.1dB PSNR.

RefDecoder improves video generation by conditioning the decoder on a reference image, fixing a common architectural flaw where decoders ignore input details. By injecting reference image information through attention mechanisms during decoding, it preserves fine details and consistency without requiring retraining of existing systems.

architecturemultimodalefficiency

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

May 14, 2026

Ellwil Sharma, Arastu Sharma

Sparse mixture-of-experts routing can solve the problem of conflicting physics domains in foundation models by automatically routing different physics problems to specialized experts while maintaining shared knowledge for universal principles.

This paper tackles negative transfer in multi-physics AI models—where training on different physics problems simultaneously hurts performance. The authors propose Shodh-MoE, which uses sparse expert routing to let different parts of the model specialize in different physics regimes (like fluid dynamics vs. porous media flows) while sharing knowledge where it helps.

May 4 – May 10(26)

Normalizing Trajectory Models

May 8, 2026

Jiatao Gu, Tianrong Chen, Ying Shen et al.

NTM enables fast image generation (4 steps) while preserving exact likelihood calculation—something previous fast diffusion methods couldn't do—by using normalizing flows for each denoising step instead of simple Gaussian assumptions.

This paper introduces Normalizing Trajectory Models (NTM), a new approach for fast image generation that compresses diffusion sampling from many steps to just four. Unlike existing fast methods that lose the ability to calculate exact probabilities, NTM maintains a mathematically exact likelihood while generating high-quality images, making it useful for both generation and evaluation.

efficiencyarchitecturetraining

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

May 8, 2026

Wei Yu, Yunhang Qian

State space models offer a practical alternative to transformers for event-based image reconstruction, achieving better results with linear computational complexity instead of quadratic, making high-resolution processing feasible.

EmambaIR uses a new type of neural network architecture (state space models) to reconstruct clear images from event camera data.

Apr 27 – May 3(22)

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

May 1, 2026

Jinpai Zhao, Nishant Panda, Yen Ting Lin et al.

Composing interpretable numerical and learned modules with learned policies outperforms monolithic neural operators on PDEs, generalizes better to out-of-distribution cases, and lets you swap components (like boundary conditions) without retraining.

HyCOP learns to solve PDEs by composing simple, interpretable modules (like advection and diffusion) rather than training a single neural network. It learns a policy that decides which module to apply and for how long based on the current state, enabling better generalization to new scenarios and easier transfer to different problems.

reasoningarchitectureefficiency

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

May 1, 2026

Siyuan Huang, Xiaoye Qu, Yafu Li et al.

PVM solves a fundamental problem in vision-language models where visual understanding degrades during long text generation by creating a separate, always-accessible pathway to visual information—improving reasoning tasks with minimal added parameters.

Large vision-language models struggle when generating long text because visual information gets diluted by accumulated text tokens. This paper introduces Persistent Visual Memory (PVM), a lightweight add-on module that maintains direct access to visual embeddings throughout generation, preventing the model from losing sight of the image as it produces longer outputs.

Apr 20 – Apr 26(25)

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

Apr 24, 2026

Sijie Li, Shanda Li, Haowei Lin et al.

Use active learning to strategically pick which small experiments to run when fitting scaling laws—you can predict large-scale model performance with 90% less compute by choosing experiments that reduce uncertainty about the target region you care about.

Training large AI models costs millions, and figuring out how they'll scale costs millions more. This paper proposes a smarter way to choose which smaller pilot experiments to run so you can accurately predict how a massive training run will perform, using only about 10% of the budget that naive approaches would need.

scalingefficiencytraining

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Apr 24, 2026

Longju Bai, Zhemin Huang, Xingyao Wang et al.

AI agents are expensive and unpredictable: token costs vary wildly (up to 30x difference on the same task), models differ dramatically in efficiency, and even frontier models can't accurately predict their own token usage before running.

This paper analyzes how much AI agents spend on tokens when solving coding tasks. Researchers studied eight frontier LLMs on real-world coding benchmarks and found that agentic tasks consume 1000x more tokens than simpler coding tasks, with huge variability between runs. Surprisingly, spending more tokens doesn't guarantee better results—accuracy often peaks at intermediate costs then plateaus.

Papers

May 18 – May 24(17)

Tokenisation via Convex Relaxations

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

May 11 – May 17(10)

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

May 4 – May 10(26)

Normalizing Trajectory Models

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

Apr 27 – May 3(22)

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Apr 20 – Apr 26(25)

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

Variance Reduction for Expectations with Diffusion Teachers

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability

SURGE: Approximation-free Training Free Particle Filter for Diffusion Surrogate

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

PIXLRelight: Controllable Relighting via Intrinsic Conditioning

Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation

MeMo: Memory as a Model

Elastic Attention Cores for Scalable Vision Transformers

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

Solve the Loop: Attractor Models for Language and Reasoning

Search Your Block Floating Point Scales!

VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection

Flow-OPD: On-Policy Distillation for Flow Matching Models

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

Fast Byte Latent Transformer

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

BAMI: Training-Free Bias Mitigation in GUI Grounding

EMO: Pretraining Mixture of Experts for Emergent Modularity

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

Taming Outlier Tokens in Diffusion Transformers

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Estimating the expected output of wide random MLPs more efficiently than sampling

Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

Conditional Diffusion Sampling

Flow Sampling: Learning to Sample from Unnormalized Densities via Denoising Conditional Processes

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring

(POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

Make Your LVLM KV Cache More Lightweight

Strait: Perceiving Priority and Interference in ML Inference Serving

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport

Select to Think: Unlocking SLM Potential with Local Sufficiency

Multiple Additive Neural Networks for Structured and Unstructured Data

Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving

Recursive Multi-Agent Systems

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning

Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty

Learning to Rotate: Temporal and Semantic Rotary Encoding for Sequential Modeling