A Transformer design choice where layer normalization is applied before the main computation rather than after.