A technique to process long sequences by distributing context across multiple devices or processing units in parallel.