The high-speed memory on a graphics processor used to store and process model weights and computations during inference.