Training approach that aligns objectives at multiple levels of granularity (e.g., frames, words, sentences) simultaneously.