A specialized AI model trained to understand video content and communicate its understanding through natural language text.