LFM2 24B A2B MLX 5bit

LFM2

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026128K context≈ 96,000 words24B params

A large mixture-of-experts model that activates only 2 billion parameters at a time despite its 24B total size, keeping inference lean without sacrificing breadth of knowledge. It handles text tasks with the efficiency you'd expect from a sparse architecture — fast responses, lower memory pressure. The 5-bit MLX quantization makes it particularly suited for Apple Silicon environments.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Factual Knowledge

Strong

Instruction Following

LFM2 24B A2B MLX 5bit

LFM2

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026128K context≈ 96,000 words24B params

A large mixture-of-experts model that activates only 2 billion parameters at a time despite its 24B total size, keeping inference lean without sacrificing breadth of knowledge. It handles text tasks with the efficiency you'd expect from a sparse architecture — fast responses, lower memory pressure. The 5-bit MLX quantization makes it particularly suited for Apple Silicon environments.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Factual Knowledge

Strong

Instruction Following

Glossary

Apple SiliconApple's custom-designed processors (like M1, M2, M3) optimized for running machine learning models on Mac computers.ArchitectureThe underlying structural design of a neural network that defines how data flows through layers and components.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.MLXA machine learning framework optimized for running models efficiently on Apple Silicon chips.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.QuantizationReducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.Sparse ArchitectureA model design where not all parameters are used for every computation, reducing memory and computational requirements compared to dense models.

Capabilities

Capabilities

Use Case Fit

Glossary