Feed-forward neural network layers in transformers that dominate parameter count and can be independently scaled.