higgs audio v3 tts 4b transformers

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released June 2026context N/A4b params

A text-to-speech focused model running at 4 billion parameters, built on a transformer architecture and distributed in safetensors format. It takes text as input and produces text as output, which is characteristic of TTS systems that generate intermediate representations before audio synthesis. Details about its specific voice quality, language support, and prosody capabilities are limited from available data.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Moderate

Multilingual

Moderate

Factual Knowledge

Use Case Fit

Fit scores are AI-generated based on model capabilities, intended use, and technical specifications. Learn more

higgs audio v3 tts 4b transformers

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released June 2026context N/A4b params

A text-to-speech focused model running at 4 billion parameters, built on a transformer architecture and distributed in safetensors format. It takes text as input and produces text as output, which is characteristic of TTS systems that generate intermediate representations before audio synthesis. Details about its specific voice quality, language support, and prosody capabilities are limited from available data.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Moderate

Multilingual

Moderate

Factual Knowledge

Use Case Fit

Fit scores are AI-generated based on model capabilities, intended use, and technical specifications. Learn more

Glossary

ArchitectureThe underlying structural design of a neural network that defines how data flows through layers and components.Intermediate RepresentationsInternal computational states or outputs generated during model processing that capture useful information between input and final output.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.ProsodyThe rhythm, intonation, and stress patterns in speech that convey emotion and meaning beyond individual words.SafetensorsA safe, fast file format for storing model weights, designed to prevent code execution vulnerabilities.Safetensors FormatA secure and efficient file format for storing model weights that prioritizes safety and speed when loading models.TransformerThe dominant neural network architecture for language models, using self-attention to process sequences.Transformer ArchitectureA neural network design that processes text by analyzing relationships between all words simultaneously, forming the foundation of modern large language models.