Papers

Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.

1492 papers15 this month12 topics

All Evaluation 42 Training 39 Agents 31 Reasoning 27 Efficiency 25 Safety 18 Multimodal 17 Applications 17 Alignment 11 Data 11 Architecture 8 scaling 6

Jul 6 – Jul 12(1)

From Fixed to Free Cameras: Calibration-Free View-Robust Vision-Language-Action Model

Jul 6, 2026

Wenhao Li, Xueying Jiang, Quanhao Qian et al.

Robot policies can achieve view robustness without camera calibration by learning to predict both action in camera space and camera-to-robot geometry, making deployment more practical when camera positions vary.

This paper introduces CamVLA, a robot vision-language-action model that learns to figure out camera positioning automatically instead of requiring explicit calibration. By predicting both camera-relative actions and the geometric relationship between camera and robot, the model works with any camera setup without needing depth data or prior calibration.

multimodalagentsapplications

Jun 29 – Jul 5(16)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Jul 2, 2026

Wentao Zhang, Liliana Hotsko, Woojeong Kim et al.

Instead of calling large language models for every fuzzy task, you can compile a natural-language specification once into a tiny reusable neural artifact that runs locally and cheaply—shifting from per-input problem solving to one-time function compilation.

This paper introduces Program-as-Weights (PAW), a method to compile natural-language function specifications into small, locally-executable neural adapters. A 4B compiler generates parameter-efficient adapters that run on a lightweight 0.6B interpreter, matching the performance of much larger models while using 50x less memory and running efficiently on consumer hardware like MacBook M3.

efficiencytrainingapplications

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Jul 2, 2026

Yuxuan Li, Lingxi Xie, Xinyue Huo et al.

Reasoning models can improve speaker identification in video by combining multiple modalities and contextual evidence, outperforming traditional audio-only approaches on challenging cases.

This paper tackles speaker recognition in long-form TV dramas by introducing DramaSR-532K, a large benchmark with 532K annotated dialogue lines, and DramaSR-LRM, a reasoning-based approach that combines audio, text, and visual information to accurately identify which character is speaking. The method works especially well on short utterances where voice alone isn't reliable.

Jun 22 – Jun 28(24)

Agentic Hardware Design as Repository-Level Code Evolution

Jun 26, 2026

Cunxi Yu, Chenhui Deng, Nathaniel Pinckney et al.

Hardware design can be automated using agentic AI that evolves code repositories with built-in validation and state management, though current benchmarks don't capture the full complexity of production chip design.

HORIZON is an AI agent framework that automatically designs hardware by treating it as code evolution in a git repository. The system uses a Markdown specification to guide an agent loop that modifies Verilog code, tracks changes through git operations, and validates designs against acceptance criteria.

agentsarchitectureapplications

Parameter Efficient Hybrid Transformer (PEHT) for Network Traffic Prediction via Dynamic Urban Congestion Integration

Jun 26, 2026

Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast

By combining parameter-efficient fine-tuning (LoRA) with multimodal fusion of urban context, you can build accurate traffic prediction models that use fewer trainable parameters without sacrificing performance.

This paper presents PEHT, a traffic prediction model that combines Transformers with urban mobility data to forecast cellular network demand. It uses LoRA to reduce parameters while a multimodal fusion strategy integrates congestion and mobility information, achieving better accuracy than existing methods on real telecom data.

Jun 15 – Jun 21(15)

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

Jun 18, 2026

Ruizhong Qiu, Yinglong Xia, Dongqi Fu et al.

Combining graph-based user co-engagement patterns with semantic tokenization creates more accurate user interest representations for generative recommendation systems at scale.

This paper presents G2Rec, a framework that improves generative recommendation systems by better organizing user behavior and item information. It combines graph-based user interaction patterns with semantic tokenization to help recommendation models understand what users want next, without needing labeled user interests.

applicationsarchitecturedata

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Jun 18, 2026

Harshit Singh, Ayush Pratap Singh, Nityanand Mathur

You can add lifelong learning to frozen TTS models by storing pronunciation fixes in a memory network instead of updating weights—enabling fast adaptation to new proper nouns without retraining.

FlowEdit enables text-to-speech systems to learn and remember pronunciation corrections for proper nouns without retraining. It stores corrections as edits in a memory network, then retrieves and applies them at inference time, reducing pronunciation errors by 93% while keeping the original model frozen.

Jun 8 – Jun 14(22)

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Jun 12, 2026

Jinsu Kim, Jihoon Tack, Noah Lee et al.

You can shrink language models for specific character personas by 50%+ while keeping 93.8% of role-playing quality, making multi-NPC applications practical without sacrificing character consistency.

This paper introduces Persona-Pruner, a technique that creates lightweight language models optimized for specific character roles by identifying and preserving only the persona-relevant parts of a full model. Unlike standard pruning that indiscriminately removes parameters, this method maintains role-playing quality while reducing computational cost—useful for applications with many NPCs.

efficiencytrainingapplications

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

Jun 12, 2026

Anthony Pineci, Yunzong Xu

A simple hidden-target-and-project strategy is provably optimal for inventory optimization with memory constraints, and viewing inventory as a one-dimensional queue dramatically simplifies the theoretical analysis.

This paper solves online inventory optimization—a practical problem where past inventory decisions constrain future actions—by maintaining a hidden target and projecting it onto feasible inventory levels. The method achieves optimal regret bounds on general convex capacity constraints, improving prior results and introducing a novel 'norm alignment' principle that simplifies the analysis.

Jun 1 – Jun 7(12)

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Jun 5, 2026

Xintao Wang, Sirui Zheng, Hongqiu Wu et al.

Long-term multi-agent simulation can teach LLMs social intelligence—agents trained on years of simulated life experience show better understanding of human-like social behavior and role-playing tasks.

Agentopia simulates 100 AI agents living together for 10 simulated years, learning from social interactions and personal growth. The framework trains language models using a 'life reward' signal based on agent well-being, showing that agents develop realistic social behaviors and that this training improves the underlying model's ability to handle social reasoning tasks.

agentstrainingapplications

Twelve quick tips for designing AI-driven HPC workflows

Jun 5, 2026

Jamie J. Alnasir

AI workflows on HPC systems need different optimization strategies than traditional scientific computing: focus on containerization for portability, smart job scheduling, explicit feedback mechanisms, and I/O efficiency rather than just raw compute throughput.

This guide offers twelve practical strategies for running AI workloads efficiently on HPC clusters. It addresses the unique challenges of AI workflows—which are iterative and data-driven—compared to traditional scientific computing, covering containerization, job scheduling, feedback loops, and file I/O optimization to help researchers build scalable, reproducible AI pipelines.

May 25 – May 31(10)

KLIP: localized distribution shift detection via KL-divergence with diffusion priors in Inverse Problems

May 29, 2026

Alireza Kheirandish, Jihoon Hong, Sara Fridovich-Keil

You can detect subtle distribution shifts in medical images by measuring how differently a diffusion model's prior and posterior distributions behave—no need for labeled anomaly examples or calibration data.

This paper introduces KLIP, a method for detecting when images deviate from expected distributions in medical imaging and other inverse problems. It uses diffusion models to spot both whole-image anomalies and localized abnormalities (like tumors in CT scans) without needing examples of the shifted distribution beforehand.

safetyevaluationapplications

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation

May 29, 2026

Ruotong Liao, Guowen Huang, Qing Cheng et al.

You can steer video generation at inference time by identifying and leveraging natural turning points in the diffusion denoising process—no retraining needed, and it scales better with more events.

This paper presents TunerDiT, a method for generating videos with multiple sequential events from text descriptions without requiring additional training. By identifying key moments in the diffusion process where text conditioning affects different aspects of video generation, the authors use strategic masking and prompt fusion to control event boundaries and transitions in long-form videos.

Papers

Jul 6 – Jul 12(1)

From Fixed to Free Cameras: Calibration-Free View-Robust Vision-Language-Action Model

Jun 29 – Jul 5(16)

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Jun 22 – Jun 28(24)

Agentic Hardware Design as Repository-Level Code Evolution

Parameter Efficient Hybrid Transformer (PEHT) for Network Traffic Prediction via Dynamic Urban Congestion Integration

Jun 15 – Jun 21(15)

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Jun 8 – Jun 14(22)

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

Jun 1 – Jun 7(12)

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Twelve quick tips for designing AI-driven HPC workflows

May 25 – May 31(10)

KLIP: localized distribution shift detection via KL-divergence with diffusion priors in Inverse Problems

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Will Scaling Improve Social Simulation with LLMs?

Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

Q-GAIN: A Python Package for Machine Learning and Physically Informed Analysis Applications

Steerability via constraints: a substrate for scalable oversight of coding agents

Bringing Agentic Search to Earth Observation Data Discovery

Know Your Source: A Public Knowledge Store for Media Background Checks

HULAT2 at MER-TRANS 2026: Governed Multi-Agent Simplification for Spanish Easy-to-Read Generation

VisionAId: An Offline-First Multimodal Android Assistant for People with Visual Impairment, Featuring Personalized Object Retrieval

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Optimal Resource Utilization for Autonomous Laboratory Orchestrators

PolicyGuard: From Organizational Policies to Neuro-SymbolicCompliance Review Engines

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

Govern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Software

Autoregressive Boltzmann Generators

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

Language-Based Digital Twins for Elderly Cognitive Assistance

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

A Multi-Fidelity Convolutional Autoencoder-Transfer Learning Framework for Guided-Wave-Based Damage Diagnosis Using Large Simulated and Limited Experimental Datasets

AI Healthcare Chatbots as Information Infrastructure: A Large-Scale Study of User-Reported Breakdowns

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

RSPC: A Benchmark for Modeling Stress and Psychiatric Conditions in Digitally Mediated Relationships using Psychiatrist Annotations

From Celebrities to Anyone: Characterizing AI Nudification Content, Technology, and Community Dynamics on 4chan

A Process Harness for Uplifting Legacy Workflows to Agentic BPM: Design and Realization in CUGA FLO

A cross-process welding penetration status prediction algorithm based on unsupervised domain adaptation in laser and TIG welding

AI translation of literary texts is "fine", but readers still prefer human translations

It's Complicated: On the Design and Evaluation of AI-Powered AAC Interfaces

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

Large-Language-Model Discovery of Quantum LDPC Codes through Structured Concept Evolution

Semantic Browsing: Controllable Diversity for Image Generation

PsyBridge: A Hybrid Intelligent Framework for Multi-Dimensional Mental Health Assessment and Decision Support

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

TailorMind: Towards Preference-Aligned Multimodal Content Generation

AI Exposure Scores: what they measure, what they miss, and what comes next

Context-Aware Hierarchical Bayesian Modeling of IVF Laboratory Environmental Conditions

Multi-View Decompilation for LLM-Based Malware Classification

DataMagic: Transforming Tabular Data into Data Insight Video

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

Correct Yourself, Keep My Trust: How Self-Correction and Social Connection Shape Credibility in Social Chatbots

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

Darshana Graph: A Parallel Commentary Corpus for Comparative Indian Philosophy, with Stylometric and Exploratory Graph Analyses

Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

Abstracting Cross-Domain Action Sequences into Interpretable Workflows

Automated reproducibility assessments in the social and behavioral sciences using large language models

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

A Three-Layer Framework for AI in Scientific Discovery

Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models