LFM2 24B A2B MLX 8bit

LFM2

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026128K context≈ 96,000 words24B params

A large mixture-of-experts model that activates only 2 billion parameters per forward pass despite its 24B total parameter count, keeping inference lean while retaining broad knowledge. It handles text tasks with the efficiency you'd expect from a selectively-activated architecture — fast responses without burning through the full model on every token. The 8-bit MLX quantization makes it particularly suited for Apple Silicon machines where memory bandwidth matters.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Factual Knowledge

Strong

Instruction Following

LFM2 24B A2B MLX 8bit

LFM2

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026128K context≈ 96,000 words24B params

A large mixture-of-experts model that activates only 2 billion parameters per forward pass despite its 24B total parameter count, keeping inference lean while retaining broad knowledge. It handles text tasks with the efficiency you'd expect from a selectively-activated architecture — fast responses without burning through the full model on every token. The 8-bit MLX quantization makes it particularly suited for Apple Silicon machines where memory bandwidth matters.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Factual Knowledge

Strong

Instruction Following

Glossary

Apple SiliconApple's custom-designed processors (like M1, M2, M3) optimized for running machine learning models on Mac computers.ArchitectureThe underlying structural design of a neural network that defines how data flows through layers and components.Forward PassA single computation cycle where input data flows through the model's layers to produce an output prediction.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.MLXA machine learning framework optimized for running models efficiently on Apple Silicon chips.Parameter CountThe total number of adjustable weights in a model; more parameters generally mean more capacity to learn, but also require more computing power.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.QuantizationReducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.TokenA small unit of text (a word, subword, or punctuation mark) that a language model breaks input into for processing.

Capabilities

Capabilities

Use Case Fit

Glossary