LFM2 24B A2B MLX 6bit

LFM2

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026128K context≈ 96,000 words24B params

A large mixture-of-experts model that activates only 2 billion parameters per forward pass despite its 24B total size, making it notably efficient at inference time. It handles text tasks with the resource footprint of a much smaller model, though the sparse activation pattern means it trades some raw capability for speed and memory savings. A practical choice when compute constraints matter more than squeezing out maximum performance.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Strong

Factual Knowledge

LFM2 24B A2B MLX 6bit

LFM2

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026128K context≈ 96,000 words24B params

A large mixture-of-experts model that activates only 2 billion parameters per forward pass despite its 24B total size, making it notably efficient at inference time. It handles text tasks with the resource footprint of a much smaller model, though the sparse activation pattern means it trades some raw capability for speed and memory savings. A practical choice when compute constraints matter more than squeezing out maximum performance.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Strong

Factual Knowledge

Glossary

Activation PatternThe specific configuration of which neurons are active across a network when processing a particular input or task.Forward PassA single computation cycle where input data flows through the model's layers to produce an output prediction.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.Inference TimeThe amount of time it takes for a model to process input and generate output after it has been trained.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.Sparse ActivationA technique where only a subset of a model's parameters are used for each input, reducing computational cost while maintaining performance.

Capabilities

Capabilities

Use Case Fit

Glossary