Memory-saving technique that partitions model states (optimizer, gradients, parameters) across devices.