Models Capabilities Use Cases Benchmarks Papers Glossary

Models Capabilities Use Cases Benchmarks Papers Glossary

About Privacy Terms RSS

ThinkLLM

Spot an error in our data? Let us know.

Glossary/Flash Attention

Flash Attention

architecture

An optimized attention mechanism that computes the same results as standard attention but much faster and with lower memory usage by reorganizing how computations are performed.

Learn more on Wikipedia

Related Capabilities

Performance retention over long documents and conversations

Flash Attention — Glossary — ThinkLLM