Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Meta’s Llama AI Team Suffers Talent Exodus As Top Researchers Join $2B Mistral AI, Backed By Andreessen Horowitz And Salesforce

Reddit Sues Anthropic for Scraping Content to Train Claude AI

Google DeepMind’s CEO Thinks AI Will Make Humans Less Selfish

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » AI-generated Tom chases Jerry for a full minute thanks to new method from Nvidia and others
Video Generation

AI-generated Tom chases Jerry for a full minute thanks to new method from Nvidia and others

Advanced AI BotBy Advanced AI BotApril 8, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


summary
Summary

Researchers have developed a method for generating longer, more coherent AI videos that tell complex stories.

While AI video generation has improved significantly in recent months, length limitations have remained a persistent challenge. OpenAI’s Sora maxes out at 20 seconds, Meta’s MovieGen at 16 seconds, and Google’s Veo 2 at just 8 seconds. Now, a team from Nvidia, Stanford University, UCSD, UC Berkeley, and UT Austin has introduced a solution: Test-Time Training layers (TTT layers) that enable videos up to one minute long.

The fundamental issue with existing models stems from their “self-attention” mechanism in Transformer architectures. This approach requires each element in a sequence to relate to every other element, causing computational requirements to increase quadratically with length. For minute-long videos containing over 300,000 tokens, this becomes computationally prohibitive.

Recurrent neural networks (RNNs) offer a potential alternative since they process data sequentially and store information in a “hidden state,” with computational demands that scale linearly with sequence length. However, traditional RNNs struggle to capture complex relationships over extended sequences due to their architecture.

Ad

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

Today, we’re releasing a new paper – One-Minute Video Generation with Test-Time Training.

We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency.

Every video below is produced directly by… pic.twitter.com/Bh2impMBWA

– Karan Dalal (@karansdalal) April 7, 2025

How TTT layers transform video generation

The researchers’ innovation replaces simple hidden states in conventional RNNs with small neural networks that continuously learn during the video generation process. These TTT layers work alongside the attention mechanism.

During each processing step, the mini-network trains to recognize and reconstruct patterns in the current image section. This creates a more sophisticated memory system that better maintains consistency across longer sequences – ensuring rooms and characters remain consistent throughout multiple scenes. A similar test-time training approach showed success in the ARC-AGI benchmark in late 2024, though that implementation relied on LoRAs.

Image: Dalal, Koceja, Hussein, Xu et al.

Share

Recommend our article

Share

The team demonstrated their approach using Tom and Jerry cartoons. Their dataset includes approximately seven hours of cartoon footage with detailed human descriptions.

Image: Dalal, Koceja, Hussein, Xu et al.

Users can describe their video ideas with varying levels of specificity:

Recommendation

Do large language models really need large context windows?

Do large language models really need large context windows?

A short summary in 5-8 sentences (e.g., “Tom happily eats an apple pie at the kitchen table. Jerry looks on longingly…”)
A more detailed plot of about 20 sentences, with each sentence corresponding to a 3-second segment
A comprehensive storyboard where each 3-second segment is described by a paragraph of 3-5 sentences detailing background, characters, and camera movements

Extending video length by 20 times

The researchers built upon CogVideo-X, a pre-trained model with 5 billion parameters that originally generated only 3-second clips. By integrating TTT layers, they progressively trained it to handle longer durations – from 3 seconds to 9, 18, 30, and finally 63 seconds.

The computationally expensive self-attention mechanisms only apply to 3-second segments, while the more efficient TTT layers operate globally across the entire video, keeping computational requirements manageable. Each video is generated by the model in a single pass, without subsequent editing or montage. The resulting videos tell coherent stories spanning multiple scenes.

Despite these advances, the model still has limitations – objects sometimes change at segment transitions, float unnaturally, or experience abrupt lighting changes.

All information, examples, and comparisons with other methods are available on GitHub.





Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAdvancing Factual Judgment Prediction and Explanation in the Indian Legal Context
Next Article DOJ ends crypto enforcement team, shifts focus to terrorism and fraud
Advanced AI Bot
  • Website

Related Posts

Samsung Teases Z Fold Ultra, Bing Gets AI Video, and Nothing Sets A Date—Your Gear News of the Week

June 7, 2025

This AI video generator is going viral, and it’s completely free to use

June 7, 2025

A Beginner’s Guide to Automated AI Video Creation

June 7, 2025
Leave A Reply Cancel Reply

Latest Posts

Jiaxing Train Station By Architect Ma Yansong Is A Model Of People-Centric, Green Urban Design

Midwestern Grotto Tradition Celebrated In Sheboygan, WI

Hugh Jackman And Sonia Friedman Boldly Bid To Democratize Theater

Men’s Swimwear Gets Casual At Miami Swim Week 2025

Latest Posts

Meta’s Llama AI Team Suffers Talent Exodus As Top Researchers Join $2B Mistral AI, Backed By Andreessen Horowitz And Salesforce

June 7, 2025

Reddit Sues Anthropic for Scraping Content to Train Claude AI

June 7, 2025

Google DeepMind’s CEO Thinks AI Will Make Humans Less Selfish

June 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.