Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.
Jan Tempus, Philip Whittington, Craig W. Schmidt et al.
ConvexTok uses convex optimization to build tokenizers that are provably near-optimal (within 1% at typical vocabulary sizes) and compress text better than greedy algorithms like BPE, with measurable improvements in language model efficiency.
This paper replaces greedy tokenization algorithms like BPE with a convex optimization approach called ConvexTok. Instead of making locally optimal choices, it formulates tokenizer construction as a linear program, achieving better compression (bits-per-byte) and allowing users to verify how close their tokenizer is to mathematically optimal.
Carlos Heredia, Daniel Roncel
Neural demand models can be designed to respect economic constraints (integrability), producing more reliable price-elasticity estimates that are both mathematically consistent and practically useful for retail pricing.
This paper introduces ICDN, a neural network model that learns demand patterns for multiple products based on prices. Unlike traditional approaches, it directly models how demand changes with price (elasticity) in a mathematically consistent way, making the learned relationships more economically realistic and stable.
Ziyu Guo, Rain Liu, Xinyan Chen et al.
A single discrete token can serve dual purposes—executing visual operations like code while also functioning as a learnable reasoning unit—making visual reasoning more efficient and trainable without architectural changes.
ATLAS introduces a single 'functional token' that acts as both an agentic operation and a latent visual reasoning unit, enabling models to reason about images without generating intermediate visual content. This approach combines the interpretability of code-based reasoning with the efficiency of latent reasoning, while remaining compatible with standard language model training.
Ruozhen He, Meng Wei, Ziyan Yang et al.
Maintaining consistent characters and objects across long video sequences is hard; explicit memory of each entity's appearance significantly improves consistency, especially when characters reappear after many shots.
EntityBench is a benchmark for evaluating multi-shot video generation—creating coherent video sequences with multiple scenes. It includes 140 episodes with detailed tracking of characters, objects, and locations across shots, plus an evaluation system that measures both video quality and consistency.
Tong Zheng, Haolin Liu, Chengsong Huang et al.
You can automatically discover better inference strategies for LLMs by treating it as a search problem over execution traces, rather than manually designing heuristics—and it's cheap to do at scale.
This paper presents AutoTTS, a framework that automatically discovers test-time scaling strategies for LLMs instead of relying on hand-crafted heuristics.
Jiatao Gu, Tianrong Chen, Ying Shen et al.
NTM enables fast image generation (4 steps) while preserving exact likelihood calculation—something previous fast diffusion methods couldn't do—by using normalizing flows for each denoising step instead of simple Gaussian assumptions.
This paper introduces Normalizing Trajectory Models (NTM), a new approach for fast image generation that compresses diffusion sampling from many steps to just four. Unlike existing fast methods that lose the ability to calculate exact probabilities, NTM maintains a mathematically exact likelihood while generating high-quality images, making it useful for both generation and evaluation.