Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

1492 papers5 this month12 topics

All Evaluation 42 Training 39 Agents 31 Reasoning 27 Efficiency 25 Safety 18 Multimodal 17 Applications 17 Alignment 11 Data 11 Architecture 8 scaling 6

Jun 29 – Jul 5(8)

Extreme Adaptive Transformer for Time Series Forecasting

Jul 2, 2026

Sanjeev Shrestha, Hui Liu, Yifan Zhang

When forecasting imbalanced time series with rare but important events, using attention mechanisms that explicitly model extreme patterns outperforms treating all time points uniformly.

This paper introduces Exformer, a Transformer model designed for time series forecasting that explicitly handles rare extreme events. Unlike standard Transformers that treat all data points equally, Exformer uses a specialized attention mechanism with three components—Local, Stride, and Extreme—to capture both normal patterns and critical outliers.

architecturereasoning

Transformer Geometry Observatory TGO-II: Representational Similarity Observatory

Jul 2, 2026

Kaustubh Kapil, Kishor P. Upla

Vision Transformers don't learn by making tokens independent; instead, they increase representational complexity through richer transformations while preserving strong token interactions, which challenges common assumptions about how these models develop.

This paper analyzes how Vision Transformers' internal representations change during training using geometric analysis tools.

architecture

Jun 22 – Jun 28(14)

Bridging Ab Initio Symmetries and Global Nuclear Masses with Interpretable Neural Networks

Jun 26, 2026

Phong Dang, Evander Espinoza, Xiaoliang Wan et al.

Physics-informed neural networks that encode fundamental symmetries can match state-of-the-art predictive performance while providing interpretable insights into which symmetry principles actually matter for nuclear binding—showing that Wigner's SU(4) symmetry carries real predictive power beyo...

This paper uses interpretable neural networks informed by nuclear symmetry principles (Wigner's SU(4) and Elliott's SU(3)) to predict nuclear binding energies across the entire nuclear chart.

architecture

Agentic Hardware Design as Repository-Level Code Evolution

Jun 26, 2026

Cunxi Yu, Chenhui Deng, Nathaniel Pinckney et al.

Hardware design can be automated using agentic AI that evolves code repositories with built-in validation and state management, though current benchmarks don't capture the full complexity of production chip design.

HORIZON is an AI agent framework that automatically designs hardware by treating it as code evolution in a git repository. The system uses a Markdown specification to guide an agent loop that modifies Verilog code, tracks changes through git operations, and validates designs against acceptance criteria.

agents

Jun 15 – Jun 21(16)

How Transparent is DiffusionGemma?

Jun 18, 2026

Joshua Engels, Callum McDougall, Bilal Chughtai et al.

Diffusion language models can achieve similar transparency to autoregressive models by treating denoised token states as interpretable checkpoints, but their ability to change all tokens simultaneously enables novel reasoning patterns that are harder to understand.

This paper investigates whether diffusion-based language models are less interpretable than traditional autoregressive models. By identifying interpretable token bottlenecks between denoising steps, the authors show DiffusionGemma's reasoning can be made nearly as transparent as standard models, though diffusion's parallel token updates create unique interpretability challenges.

architectureevaluation

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Jun 18, 2026

Wenhao Chi, Arkaprava Sinha, Dominick Reilly et al.

Using proxy models as intermediaries between diverse teachers prevents conflicting gradients and enables learning richer egocentric representations from heterogeneous knowledge sources—achieving better results than naive multi-teacher distillation.

This paper introduces UNIEGO, a unified egocentric video encoder trained through a novel multi-teacher distillation framework.

Jun 8 – Jun 14(13)

AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

Jun 12, 2026

Jixuan Chen, Jianzhi Shen, Haoqiang Kang et al.

When building LLM agents, component interactions and scaffold compatibility matter more than individual module quality—AgentSpec provides tools to systematically test these combinations.

AgentSpec is a modular framework for building and understanding embodied AI agents by standardizing how components like memory, reasoning, and action execution connect. Instead of tightly coupled systems, it lets researchers swap components in and out to see how they interact, revealing that agent performance depends more on how modules work together than individual component strength.

agentsarchitectureevaluation

Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows

Jun 12, 2026

Shikun Liu, Mufei Li, Dongqi Fu et al.

Direct cache-based synthesis enables LLM agents to efficiently combine parallel branches without redundant computation, making multi-agent workflows faster and more aligned with how modern systems actually work.

This paper introduces Parallel-Synthesis, a framework that lets LLM agents directly process cached outputs from multiple parallel worker branches instead of concatenating text. By working with KV caches directly, it reduces computation time by 2.5-11x while maintaining or improving performance across math, code, and reasoning tasks.

Jun 1 – Jun 7(14)

Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

Jun 5, 2026

Fatema Siddika, Md Anwar Hossen, Tanwi Mallick et al.

By separating task-specific experts from shared experts with adaptive routing, SETA solves catastrophic forgetting in continual learning without sacrificing performance on new tasks—useful for deploying LLMs that need to learn from multiple domains over time.

SETA is a continual learning framework that prevents LLMs from forgetting old knowledge while learning new tasks by splitting model parameters into task-specific and shared expert modules. Instead of all tasks competing for the same weights, the method uses sparse subspace decomposition to isolate what's unique to each task while preserving shared capabilities across tasks.

trainingefficiencyarchitecture

Drifting Models for Surrogate Flow Modeling

Jun 5, 2026

Chris R. Jung, Markus Dörr, Natalie Jüngling et al.

Drifting models can replace slow iterative diffusion for CFD surrogates, enabling real-time flow field generation that's orders of magnitude faster while matching diffusion model quality.

This paper speeds up CFD simulations by using a generative model called "drifting" instead of traditional diffusion models. The model learns to generate realistic fluid flow patterns in a single pass rather than iteratively, making it 100x faster while maintaining accuracy. It uses a learned latent space and label-aware masking to ensure generated flows match boundary conditions.

May 25 – May 31(17)

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

May 29, 2026

Jiazheng Xing, Hangjie Yuan, Lingling Cai et al.

By separating training (lightweight generator) from inference (high-capacity generator), you can build reasoning-driven video models that produce cinema-quality results without prohibitive training costs.

Lumos-Nexus is a video generation system that combines reasoning capabilities with high visual quality by using a lightweight generator during training and progressively handing off to a powerful generator at inference time. This two-stage approach lets models understand user intent and generate coherent videos without the computational cost of training with large generators.

multimodalefficiencyarchitecture

Functional Attention: From Pairwise Affinities to Functional Correspondences

May 29, 2026

Jiefang Xiao, Maolin Gao, Simon Weber et al.

Functional Attention replaces token-wise attention with function-space mappings, enabling transformer-like models to handle continuous fields more naturally and work reliably across different input resolutions.

This paper introduces Functional Attention, a new way to process continuous data (like PDEs or 3D shapes) by treating attention as mappings between function spaces rather than discrete tokens. Instead of softmax attention, it uses structured linear operators inspired by geometric functional maps, making the model work consistently across different resolutions and discretizations.

May 18 – May 24(14)

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

May 22, 2026

Shuhong Zheng, Michael Oechsle, Erik Sandström et al.

By selectively dropping redundant image patches across frames and within frames using attention entropy, you can speed up 3D reconstruction transformers dramatically without sacrificing quality.

This paper tackles the computational bottleneck in visual geometry transformers—models that reconstruct 3D scenes from multiple images. The authors propose a token selection strategy that reduces which image patches the model attends to, cutting computation by 85% while maintaining or improving accuracy.

efficiencyarchitectureevaluation

Integrable Elasticity via Neural Demand Potentials

May 21, 2026

Carlos Heredia, Daniel Roncel

Neural demand models can be designed to respect economic constraints (integrability), producing more reliable price-elasticity estimates that are both mathematically consistent and practically useful for retail pricing.

This paper introduces ICDN, a neural network model that learns demand patterns for multiple products based on prices. Unlike traditional approaches, it directly models how demand changes with price (elasticity) in a mathematically consistent way, making the learned relationships more economically realistic and stable.

May 11 – May 17(4)

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

May 14, 2026

Ruozhen He, Meng Wei, Ziyan Yang et al.

Maintaining consistent characters and objects across long video sequences is hard; explicit memory of each entity's appearance significantly improves consistency, especially when characters reappear after many shots.

EntityBench is a benchmark for evaluating multi-shot video generation—creating coherent video sequences with multiple scenes. It includes 140 episodes with detailed tracking of characters, objects, and locations across shots, plus an evaluation system that measures both video quality and consistency.

evaluationmultimodalarchitecture

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

May 14, 2026

Xiang Fan, Yuheng Wang, Bohan Fang et al.

Video generation systems lose detail because their decoders ignore the input image—adding reference conditioning to the decoder recovers this information and improves quality by up to 2.1dB PSNR.

RefDecoder improves video generation by conditioning the decoder on a reference image, fixing a common architectural flaw where decoders ignore input details. By injecting reference image information through attention mechanisms during decoding, it preserves fine details and consistency without requiring retraining of existing systems.

Papers

Jun 29 – Jul 5(8)

Extreme Adaptive Transformer for Time Series Forecasting

Transformer Geometry Observatory TGO-II: Representational Similarity Observatory

Jun 22 – Jun 28(14)

Bridging Ab Initio Symmetries and Global Nuclear Masses with Interpretable Neural Networks

Agentic Hardware Design as Repository-Level Code Evolution

Jun 15 – Jun 21(16)

How Transparent is DiffusionGemma?

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Jun 8 – Jun 14(13)

AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows

Jun 1 – Jun 7(14)

Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

Drifting Models for Surrogate Flow Modeling

May 25 – May 31(17)

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

Functional Attention: From Pairwise Affinities to Functional Correspondences

May 18 – May 24(14)

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Integrable Elasticity via Neural Demand Potentials

May 11 – May 17(4)

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

Hardware-Enforced Semantic Coordination for Safety-Critical Real-Time Autonomous Systems

The State-Prediction Separation Hypothesis

TiRex-2: Generalizing TiRex to Multivariate Data and Streaming

SemRF: A Semantic Reference Frame for Residual-Stream Dynamics in Language Models

LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training

Beyond 2D Matching: A Unified Single-Stage Framework for Geometry-Aware Cross-View Object Geo-Localization

DanceOPD: On-Policy Generative Field Distillation

Autoregressive Boltzmann Generators

Generative Models on Analog Hardware with Dynamics

Effective Covariance Dynamics in Solvable High-Dimensional GANs

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

Explaining Temporal Graph Neural Networks via Feature-induced Information Flow

A Process Harness for Uplifting Legacy Workflows to Agentic BPM: Design and Realization in CUGA FLO

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

Real vs. Complex Spectral Bases for Neural Operators: The Role of Green's Function Alignment

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Large-Language-Model Discovery of Quantum LDPC Codes through Structured Concept Evolution

Tapered Language Models

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

On the Redundancy of Timestep Embeddings in Diffusion Models

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Variable-Width Transformers

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

Looped World Models

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Geometric Action Model for Robot Policy Learning

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

Recursive Agent Harnesses

From Tokens to Faces: Investigating Discrete Speech Representations for 3D Facial Animation

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

EvTexture++: Event-Driven Texture Enhancement for Video Super-Resolution

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws

Topological Neural Operators

Echo-Memory: A Controlled Study of Memory in Action World Models

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

Graph Neural Network leveraging Higher-order Class Label Connectivity for Heterophilous Graphs

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Pretraining Recurrent Networks without Recurrence

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization

An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Geometry Gaussians: Decoupling Appearance and Geometry in Gaussian Splatting

Neuron Populations Exhibit Divergent Selectivity with Scale

Formalizing the Binding Problem

AdaCodec: A Predictive Visual Code for Video MLLMs

SimSD: Simple Speculative Decoding in Diffusion Language Models