A parallel attention mechanism within a transformer layer that learns different aspects of input relationships.
Multi-step reasoning, logic puzzles, mathematical problem-solving