A transformer-based model design that uses locality-sensitive hashing and reversible layers to efficiently process long sequences with reduced memory requirements.