Paper2Video: Automatic Video Generation From Scientific Papers - Takara TLDR

Academic presentation videos have become an essential medium for research
communication, yet producing them remains highly labor-intensive, often
requiring hours of slide design, recording, and editing for a short 2 to 10
minutes video. Unlike natural video, presentation video generation involves
distinctive challenges: inputs from research papers, dense multi-modal
information (text, figures, tables), and the need to coordinate multiple
aligned channels such as slides, subtitles, speech, and human talker. To
address these challenges, we introduce PaperTalker, the first benchmark of 101
research papers paired with author-created presentation videos, slides, and
speaker metadata. We further design four tailored evaluation metrics–Meta
Similarity, PresentArena, PresentQuiz, and IP Memory–to measure how videos
convey the paper’s information to the audience. Building on this foundation, we
propose PaperTalker, the first multi-agent framework for academic presentation
video generation. It integrates slide generation with effective layout
refinement by a novel effective tree search visual choice, cursor grounding,
subtitling, speech synthesis, and talking-head rendering, while parallelizing
slide-wise generation for efficiency. Experiments on Paper2Video demonstrate
that the presentation videos produced by our approach are more faithful and
informative than existing baselines, establishing a practical step toward
automated and ready-to-use academic video generation. Our dataset, agent, and
code are available at https://github.com/showlab/Paper2Video.

Source link

What's Hot

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation – Takara TLDR

OpenAI’s Blockbuster AMD Deal Is a Bet on Near-Limitless Demand for AI

Top MIT Researcher Shows Decentralization Could Speed Up Ethereum, Solana

Paper2Video: Automatic Video Generation from Scientific Papers – Takara TLDR

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation – Takara TLDR

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents – Takara TLDR

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation – Takara TLDR

Tomb of Amenhotep III Reopens After Two-Decade Renovation

Limited Edition Print of Ozzy Osbourne Art Sold To Benefit Charities

Odili Donald Odita Sues Jack Shainman Gallery over ‘Withheld’ Artworks

Mohamed Hamidi, Moroccan Modernist Painter, Has Died at 84

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation – Takara TLDR

OpenAI’s Blockbuster AMD Deal Is a Bet on Near-Limitless Demand for AI

Top MIT Researcher Shows Decentralization Could Speed Up Ethereum, Solana

What's Hot

Paper2Video: Automatic Video Generation from Scientific Papers – Takara TLDR

Related Posts

Subscribe to Updates