Recent AI research papers with accessible summaries. Updated daily from arXiv, summarized for developers who don't read papers regularly.
Chongjie Ye, Cheng Cao, Chuanyu Pan et al.
By unifying 2D and 3D generation in one model and leveraging plentiful 2D data as a structural constraint, you can train better 3D generators with limited 3D assets—no separate 2D-to-3D conversion pipeline needed.
Omni123 is a 3D foundation model that generates both 2D images and 3D objects from text by treating them as sequences of tokens. It uses abundant 2D image data as a guide to improve 3D generation, avoiding the need for scarce aligned text-image-3D datasets. The model cycles through different modalities (text→image→3D→image) to ensure consistency across all forms.
Youssef Saidi, Haroun Elleuch, Fethi Bougares
End-to-end speech-to-entity models substantially outperform cascaded ASR+NER pipelines for Arabic, and multilingual pretraining transfers better than Arabic-specific pretraining for this low-resource task.
This paper introduces CV-18 NER, the first dataset for extracting named entities directly from Arabic speech. The researchers created 21 entity types by annotating the Arabic Common Voice corpus, then compared end-to-end speech models (Whisper, AraBEST-RQ) against traditional pipelines that first transcribe speech then extract entities.
Yuxing Lu, Xukai Zhao, Wei Wu et al.
You can improve RAG systems by preprocessing your corpus once to add distilled, compact versions of relevant documents—this works with any retrieval method and shows consistent gains without changing your pipeline.
This paper proposes WriteBack-RAG, a method that improves retrieval-augmented generation (RAG) systems by treating the knowledge base as trainable. Using labeled examples, the system identifies relevant documents, distills them into compact knowledge units, and adds these to the corpus.
Abdul Rahman
Security AI models fail when deployed to new environments because telemetry data is fragmented. CSTS solves this by providing a unified, entity-focused data structure that maintains consistent identity and relationships across different systems.
This paper introduces CSTS, a standardized way to represent security data that helps AI systems detect cyber threats across different computer networks. Instead of treating security events as isolated incidents, CSTS organizes them around entities (like users or devices) and their relationships, making AI models more reliable when deployed in new environments.
Jiazheng Xing, Fei Du, Hangjie Yuan et al.
To generate videos with multiple people where each person's appearance stays consistent with their attributes, you need both better training data that captures identity-attribute relationships and model attention mechanisms designed to enforce those relationships.
LumosX improves personalized video generation by explicitly linking identities to their attributes. It uses a data pipeline with multimodal AI to extract subject relationships, then applies specialized attention mechanisms in diffusion models to ensure faces stay consistent with their assigned attributes across video frames.
Masoumeh Shafieinejad, Xi He, Mahshid Alinoori et al.
Synthetic data from diffusion models may not be as privacy-safe as assumed—membership inference attacks can still reveal whether specific records were in the training data, even with synthetic tabular outputs.
This challenge evaluates how well synthetic tabular data generated by diffusion models protects privacy against membership inference attacks. Researchers tested whether synthetic data truly hides information about individuals in the original dataset, developing new attack methods to measure privacy risks across different types of tabular data structures.
Xin Chen, Junchao Wu, Shu Yang et al.
You can train better LLMs on less data by selecting instruction examples that activate the same neurons as your target task—this beats using all data or relying on external models to score examples.
This paper introduces NAIT, a method for selecting the most useful instruction-tuning data for large language models by analyzing which neurons activate when processing different types of tasks. Instead of using all available training data, NAIT identifies a small subset (10% of data) that produces better results by matching neuron activation patterns to target capabilities.
Haonan Huang
AI agents performing scientific research need memory and reflection, not just execution capability. Knowledge consolidation between runs dramatically improves efficiency and accuracy in computational science workflows.
QMatSuite is a platform that helps AI agents learn from computational materials science experiments by storing findings, retrieving past knowledge, and reflecting on results.
Yijiashun Qi, Yijiazhen Qi, Tanmay Wagh
Use knowledge graph topology to guide web crawling toward undiscovered entities, making supplier discovery more complete with less computational cost.
This paper tackles the problem of finding all small and medium-sized businesses in specialized industries (like semiconductor equipment makers) by combining web crawling, knowledge graphs, and smart coverage estimation.
Xiaolong Zhang, Jianwei Zhang, Selim Sevim et al.
Unsupervised learning can remove batch effects from medical images, letting models generalize across hospitals without retraining.
Medical image analysis struggles when microscope slides are stained or scanned differently across hospitals—models trained on one site fail at another. This paper introduces a technique that learns to remove these visual differences automatically, making AI models work reliably across different clinical sites without needing labeled examples.