The study of understanding how a language model's internal components and computations work to produce its outputs.