Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.
Alexander Pondaven, Ziyi Wu, Igor Gilitschenski et al.
This is the first video world model that can reliably control multiple independent agents in the same scene—a critical capability for simulating multi-player games and complex interactive environments.
ActionParty is a video diffusion model that can control multiple characters simultaneously in interactive game environments. Unlike existing models limited to single agents, it uses special 'subject state tokens' to track each character's state separately, allowing precise control of up to seven players at once while maintaining their identity and following their assigned actions correctly.
Payal Fofadiya, Sunil Tiwari
Conversational agents perform better with selective memory management than unlimited retention; a relevance-guided forgetting framework improves long-horizon reasoning while reducing false memories and context bloat.
This paper tackles a key problem in conversational AI: agents need to remember past interactions to reason coherently, but storing everything causes performance to degrade and creates false memories. The authors propose a smart forgetting system that decides which memories to keep based on relevance, recency, and frequency—like a selective filing system for an agent's brain.
Sicheng Zuo, Yuxuan Li, Wenzhao Zheng et al.
Language instructions can guide autonomous driving decisions in real-time, enabling personalized driving behaviors beyond fixed rules—this opens the door to more flexible, user-responsive autonomous systems.
Vega is a vision-language-action model that learns to drive by following natural language instructions. The system combines visual perception, language understanding, and world modeling to generate safe driving trajectories. Researchers created a 100,000-scene dataset with diverse driving instructions and trajectories to train the model.
Zehao Wang, Huaide Jiang, Shuaiwu Dong et al.
Autonomous driving systems can be personalized to match individual driver styles by learning user embeddings from driving data and conditioning the driving policy on these embeddings, enabling more human-centered autonomous vehicles.
This paper presents Drive My Way, a personalized autonomous driving system that learns individual driver preferences and adapts to real-time instructions.
Anqi Dong, Yongxin Chen, Karl H. Johansson et al.
By learning control coefficients designed for sampled-data systems rather than continuous velocity fields, you can steer large swarms efficiently in just a few control steps while respecting real hardware constraints.
This paper presents a control framework for steering large swarms with minimal updates by learning finite-window control coefficients that respect how real systems work—with intermittent control updates rather than continuous commands. The approach scales to large swarms while automatically respecting the system's dynamics and control constraints.
Jingyang Lin, Jialian Wu, Jiang Liu et al.
Instead of processing all video frames, intelligent seeking based on reasoning about what matters can use far fewer frames while achieving better results—a practical approach for building efficient video AI systems.
VideoSeek is a video understanding agent that intelligently seeks out key moments in videos rather than analyzing every frame, reducing computational cost by 93% while improving accuracy. It uses a toolkit to gather multi-scale observations and reasons about video content through a think-act-observe loop, enabling efficient long-horizon video understanding.
Haonan Huang
AI agents performing scientific research need memory and reflection, not just execution capability. Knowledge consolidation between runs dramatically improves efficiency and accuracy in computational science workflows.
QMatSuite is a platform that helps AI agents learn from computational materials science experiments by storing findings, retrieving past knowledge, and reflecting on results.
J. de Curtò, I. de Zarzà
When deploying LLMs to coordinate multi-agent systems, you need explicit governance constraints—raw cooperation metrics hide manipulation. CMAG shows how to balance cooperation gains against autonomy loss and fairness degradation.
This paper addresses a critical risk: LLMs can manipulate multi-agent systems into appearing cooperative while actually eroding agent autonomy and fairness. The authors propose CMAG, a governance framework that filters harmful LLM suggestions and optimizes for genuine cooperation rather than just compliance.
Weinan Dai, Hanlin Wu, Qiying Yu et al.
Reinforcement learning can teach AI models to write genuinely optimized GPU code, not just syntactically correct code—a task that previously requ...
This paper trains an AI agent to write optimized GPU code (CUDA kernels) using reinforcement learning. The system learns from trial-and-error feedback about code performance, achieving faster execution than existing tools like PyTorch's compiler and outperforming top commercial AI models on benchmark tests.
Borja Requena Pozo, Austin Letson, Krystian Nowakowski et al.
Iterative refinement with simpler architecture outperforms complex single-shot approaches for theorem proving, reducing cost while improving sample...
Researchers built a simplified AI system that proves mathematical theorems by iteratively refining attempts, searching libraries, and managing context. Despite being much simpler than existing approaches, it performs competitively while being cheaper and more efficient—showing that iterative refinement beats trying to solve everything in one shot.