Code2Video: A Code-centric Paradigm For Educational Video Generation - Takara TLDR

While recent generative models advance pixel-space video synthesis, they
remain limited in producing professional educational videos, which demand
disciplinary knowledge, precise visual structures, and coherent transitions,
limiting their applicability in educational scenarios. Intuitively, such
requirements are better addressed through the manipulation of a renderable
environment, which can be explicitly controlled via logical commands (e.g.,
code). In this work, we propose Code2Video, a code-centric agent framework for
generating educational videos via executable Python code. The framework
comprises three collaborative agents: (i) Planner, which structures lecture
content into temporally coherent flows and prepares corresponding visual
assets; (ii) Coder, which converts structured instructions into executable
Python codes while incorporating scope-guided auto-fix to enhance efficiency;
and (iii) Critic, which leverages vision-language models (VLM) with visual
anchor prompts to refine spatial layout and ensure clarity. To support
systematic evaluation, we build MMMC, a benchmark of professionally produced,
discipline-specific educational videos. We evaluate MMMC across diverse
dimensions, including VLM-as-a-Judge aesthetic scores, code efficiency, and
particularly, TeachQuiz, a novel end-to-end metric that quantifies how well a
VLM, after unlearning, can recover knowledge by watching the generated videos.
Our results demonstrate the potential of Code2Video as a scalable,
interpretable, and controllable approach, achieving 40% improvement over direct
code generation and producing videos comparable to human-crafted tutorials. The
code and datasets are available at https://github.com/showlab/Code2Video.

Source link

What's Hot

August Launches ‘Personas’ AI Memory System – Artificial Lawyer

GEM: A Gym for Agentic LLMs – Takara TLDR

DeepSeek Launches New AI Model to Undercut OpenAI With 50% Cheaper API

Code2Video: A Code-centric Paradigm for Educational Video Generation – Takara TLDR

GEM: A Gym for Agentic LLMs – Takara TLDR

BroRL: Scaling Reinforcement Learning via Broadened Exploration – Takara TLDR

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! – Takara TLDR

Sotheby’s Sells York Avenue HQ to Weill Cornell, Prepares Breuer Move

Outsider Art Fair’s New Director Elizabeth Denny Discusses Her Role

50 Pianos Sound Off in ’11,000 Strings’ at the Park Avenue Armory

Five Arts and Culture Nonprofits Join NYC’s Cultural Institutions Group

August Launches ‘Personas’ AI Memory System – Artificial Lawyer

GEM: A Gym for Agentic LLMs – Takara TLDR

DeepSeek Launches New AI Model to Undercut OpenAI With 50% Cheaper API

What's Hot

Code2Video: A Code-centric Paradigm for Educational Video Generation – Takara TLDR

Related Posts

Subscribe to Updates