Language-Critique Imitation Learning from Suboptimal Demonstrations

Chih-Han Yang, Dai-Jie Wu, Yun-Ping Huang, Ping-Chun Hsieh, Kenneth Marino et al.|July 1, 2026arXiv

Key Takeaway

Language-based feedback preserves more information than scalar signals when learning from imperfect data, enabling policies to understand not just what went wrong but why and how to fix it.

Summary

This paper proposes using natural language critiques as structured supervision signals for learning from suboptimal demonstrations. Instead of compressing feedback into scalar scores, the method generates language labels describing task progress, failures, and corrections, then trains policies directly on these rich signals.

training data

Key Terms

behavior-cloning diffusion-policy imitation-learning suboptimal-demonstrations structured-supervision