Moving key-value cache data to slower storage (CPU/disk) to reduce GPU memory usage during inference.