Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

It takes at least five years to understand any industry, says Zhipu AI COO Zhang Fan · TechNode

Turning Speech Into Insights – Artificial Lawyer

C3 AI Awarded $13 Million Task Order to Expand Predictive Maintenance Program Across U.S. Air Force Fleet

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Best Short-Form AI Video Generator? Kling 2.1 vs Google Veo 3
Video Generation

Best Short-Form AI Video Generator? Kling 2.1 vs Google Veo 3

Advanced AI BotBy Advanced AI BotJune 1, 2025No Comments11 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


In brief

Kling 2.1 launched to compete directly with Google’s Veo 3 in the AI video generation market.
Testing reveals Kling 2.1 excels at image-to-video conversion while Veo 3 dominates with integrated audio generation capabilities .
Both models deliver cinema-quality results, but require different workflows and budget considerations.

AI video generation just got a serious upgrade. Kuaishou’s Kling 2.1 can now produce videos that look genuinely cinematic—the kind of footage that would have required a film crew and expensive equipment just months ago. Characters move naturally, emotions feel authentic, and complex action sequences unfold without the telltale artifacts that usually scream “this was made by AI.”

Kling is one of the better-known, advanced video-generation platforms, and was launched a year ago by Kuaishou, a Chinese tech company also known for its social media innovations. It’s especially known for its ability to create HD videos up to two minutes long—and for being the model picked by many meme makers to animate their political satire of people like Trump, Elon Musk, and other influential figures.

The new technical improvements include faster generation speeds, better prompt adherence, more realism, and less artifacts. The Master tier utilizes advanced 3D spatiotemporal attention mechanisms and proprietary 3D VAE technology for what the company describes as cinema-grade output.

The timing couldn’t be more pointed. Kuaishou released the 2.1 family just days after Google unveiled Veo 3, consolidating what appears to be a monopoly of the top spot in the AI video leaderboards. The competition is so heated up that interest in “AI video” hit an all-time high this month according to Google Trends—and most of it is fueled by how good the models are.

Early access users have been sharing demonstration videos across social media platforms, praising the Master edition for its capacity to generate “mind-blowing” cinematics.

Honestly, this @Kling_ai v2.1 (early access) is blowing my mind 🤯
The text-to-video mode is insane — smooth, creative, and super promising 🔥

Can’t stop exploring what it can do. pic.twitter.com/O2MucdPWDr

— Pierrick Chevallier | IA (@CharaspowerAI) May 26, 2025

Benchmark comparisons show Kling’s predecessor, Kling 2.0, outperformed all rival models except for Google’s Veo 2—and 3. The 2.1 version enhances existing functionalities and resolves previous concerns regarding generation speed and consistency. Although too recent to be included in current AI leaderboards, updates with comprehensive testing data are expected soon. The 2.1 Master model is anticipated to widen the performance difference between Google and Kling and their rivals.

Veo vs Kling: How do they compare?

We tested both models to see how they stack up. The best of the best in AI video isn’t cheap—Kling 2.1 Master charges almost $3 for 10 seconds of video—and it’s still far from achieving the level of granularity that real video editing requires. However, both Veo and Kling represent clear upgrades over the previous generation of models, and any enthusiast will be very pleased with their capabilities.

Kuaishou’s strategy shines because, unlike its competitors, Kling 2.1 comes in three flavors: Standard mode at 720p for 20 credits per 5-second video, Professional mode at 1080p for 35 credits, and Master mode at 1080p for 100 credits. The better the model, the more expensive and longer it takes to render—but even the most basic option provides better results than the previous Kling 1.6 Pro.

The wait time is significant: Veo3 typically had me twiddling my thumbs for around 5 minutes per video, and sometimes took more than 15 minutes. Likewise, system clogging meant that I got a lot of errors, meaning I had to re-do the generation.

The pricing structure reflects a nonlinear progression, with Professional mode delivering visual quality very close to Master’s at less than half the cost. In our subjective assessment, the middle tier was the most cost-effective option for professional creators requiring HD clarity without ultimate cinematic polish.

Text generation

Prompt: A cute robot with the word “EMERGE” written on its belly, approaches the camera, smiles with its digital face and flies away.

Kling 2.1, especially the Master version, shows significant improvement over the previous 1.6. The text renders cleanly and tends to be more uniform across frames.

However, when analyzing this specific feature alone, Veo 3 has a slight advantage. Both models can generate text, but Veo 3 does it more consistently.

For example, both models successfully generated a small robot with the word “EMERGE.” However, when we generated a scene where that robot wasn’t the main focus, Veo 3 still delivered accurate text while Kling produced gibberish.

Realism and human emotion

Prompt: A woman approaches the river with profound sadness. She retrieves a lifeless robot inscribed with the word “Emerge” as she weeps and laments her loss.

If Kling 1.6 Pro focused on dynamic scenes and fluid movement, Kling 2.1 seems to have shifted its focus to realism. The model excels in complex motion sequences, accurately rendering details like joint alignment and realistic physics effects in vehicle stunts. The model’s enhanced prompt adherence allows for precise control over camera movements and emotional expressions.

The reactions feel more genuine than those from Kling 1.6 Pro and even Veo 2.

However, when compared to Veo 3, the fact that Veo 3 can generate audio becomes a major factor that enhanced a scene’s emotional impact.

When asked to generate a scene with the same prompt, Veo 3 took a much more cinematic approach. The camera angle and color grading contributed to portraying the emotions in the scene.

Kling 2.1, on the other hand, focused on the portrayal of the emotion itself.

The lack of audio and the different approach made it hard to declare one superior to the other. It depends on each user’s taste, a bit of luck with the generation, and what you value more—the overall mood of a scene or the acting performance.

In this scene, the word Emerge was not rendered properly by Kling 2.1 Master. Note that the dead robot was not the main character in the scene, so the model put more efforts toward other elements that were prevalent in the prompt.



Image-to-video

Prompt: The scene begins exactly as shown, then accelerates into a hypnotic time-lapse where decades flow by in seconds. The vintage taxi remains frozen in time while the city transforms around it – neon signs evolve from traditional Chinese characters to holographic displays, buildings morph and grow taller, people’s clothing shifts through eras, and flying vehicles begin weaving between the structures. The camera slowly orbits the stationary taxi as it becomes a temporal anchor in this swirling vortex of urban evolution, ending with the same taxi in a fully futuristic cityscape.

Image-to-video is a technique in which the user provides the starting frame of a scene and the AI model builds its generation on top of that image as a starting point. It provides the best level of control and lets users have an idea of what to expect from each generation.

Kling 2.1’s Standard and Professional modes currently support only image-to-video generation, requiring users to provide source images. The company announced that text-to-video capabilities will be added to these tiers soon, while Master mode already includes this feature alongside enhanced dynamics and prompt adherence.

Both Kling 2.1 Master and Veo 3 support image-to-video, but Veo 3 requires using Flow instead of the normal Gemini UI. When using Flow, the generated videos lack audio.

In our test, Kling 2.1 was better than Veo 3, but far from perfect. It was able to understand the camera movement, the elements, and the intention of the scene. However, it failed to keep focus on the main subject and instead paid attention to the surroundings (the city evolving through time) as it turned into the key element in the scene.

Veo 3, on the other hand, remained focused on the subject (the car), but failed to render any of the other elements in the prompt. As a result it generated a static car, with a static shot, with the same city, only with some flying cars passing around. It failed to deliver an accurate result.

In general, that was expected. Kling 2.1 will provide better results in less generations, requiring less prompt engineering. It also has the option to input a negative prompt, which could help a lot to obtain the desired results.

Anime/cartoon and 2D art

I tried three times to generate anime-style video and couldn’t. Generating 2D art with these models seemed impossible, probably because they are focused on realism.

The best alternative seems to be generating the initial 2D frame with an image generator, then leveraging the image-to-video capabilities to get the desired scene.

Multi-subject scenes

Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing

It’s still challenging for AI models to handle multi-subject scenes. When there are more than three main characters and the scene is dynamic, the models lose consistency, merging characters, generating new ones, and showing numerous artifacts.

This remains the case for Kling 2.1. The model represents a significant improvement over previous generations, but it still fails to manage complex scenes accurately. In our tests, it didn’t generate five wolves and instead produced three.

Veo 3, though, attempted to generate the full pack. Things didn’t work out initially, but near the end of the scene, the model separated all the wolves enough to regain coherence and was ultimately able to generate all five wolves.

Kling 2.1, however, sacrificed a bit of prompt adherence for a substantial gain in coherence—and that seems like the better outcome.

Dynamic shots

Prompt: Dynamic tracking shot following a woman in a vibrant crimson dress as she sprints desperately through downtown New York’s neon-lit canyon of skyscrapers. Her flowing hair catches fragments of electric blue light from towering digital billboards while dust and debris swirl chaotically around her. Behind her, a massive mechanical cyber spider with gleaming chrome legs and pulsing LED sensors crashes through the urban landscape, its metallic limbs sparking against concrete as it pursues relentlessly… (full prompt is in the YouTube description)

Dynamic shots are tricky to evaluate because the devil is in the details. Usually, when things happen fast and the focus is on a main character, the rest of the elements go unnoticed. This is why generative video models have tended to produce interesting shots that, upon careful inspection, fell flat.

Happily, in our tests, Kling 2.1 proved far more dynamic than 2.0 and Kling 1.6. It generated fast-paced scenes, dramatic shots, and compelling action sequences. Generations with previous Kling models usually showed a few static or slow frames before jumping into the action. This problem has been resolved.

Veo 3 added some dynamism with a good soundtrack. The model also generated everything that a good action sequence requires—motion, explosions, dynamic shots, dust, and chaos—and felt more realistic and less 2.5D or green screen-ish.

However, when compared to Veo 3, Kling 2.1 excelled in prompt adherence. Our woman runs away from the giant spider, whereas Veo 3 generated a woman running toward the spider—a great scene that ends up being useless.

Also, the woman in the Veo 3 generation started running unnaturally near the halfway point of the generation, which represents one of the challenges AI companies must tackle when dealing with long-form content—maintaining consistency in continuous shots that last long enough to disrupt model coherence.

Conclusion

I hate to say it, but there isn’t really a clear winner, and for the first time in the generative AI video space, the best choice depends on what you expect and how much you’re willing to pay.

Veo 3 has a clear advantage thanks to its audio generation. The sound is coherent and clear enough that any silent video now feels like a step backward. Adding coherent audio in post-production remains a notoriously difficult task, so this could be the make-or-break deal for many.

Kling 2.1, on the other hand, is the winner for image-to-video conversion, allowing users to take real-life photos or images created with specialized models like Flux or Ideogram and transform them into compelling animations. You can’t do image-to-video in Gemini—you need Flow, which is still in beta and only supports Veo 3 through the $250-per-month subscription, with only widescreen mode supported. Even then, it delivers lower quality compared to Kling.

Beyond those two key differences, the rest comes down to circumstance or personal preference. They are all very realistic, coherent (for today’s standards), creative, and will provide the best AI-generated videos you can ask for. If the difference is based on preference, then you need to adapt your prompts to each model, and the difference in results will be apparent.

If you don’t want to break your wallet, even Kling 2.1 standard will provide amazing results far better than any other model in the industry, and close enough to state-of-the-art levels.

In general terms, according to our testing, first place in the generative video ranking is essentially tied between Veo 3 and Kling 2.1 Master. Third place, for open-source enthusiasts, goes to Wan 2.1—and will probably remain there for a while. Its VACE, LoRAs, and workflows have turned this free, uncensored model into a beast of its own.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.





Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTwelve Labs’ API Playground empowers video-driven industries with semantic video search
Next Article A New Trick Could Block the Misuse of Open Source AI
Advanced AI Bot
  • Website

Related Posts

Save Time, Save Money: The Ultimate AI Video Generator from VideoPlus.ai

June 3, 2025

Microsoft Bing gets a free Sora-powered AI video generator

June 2, 2025

Chatbot platform Character.AI unveils video generation, social feeds

June 2, 2025
Leave A Reply Cancel Reply

Latest Posts

2025 Guide to The Newest, The Coolest And The Craziest Music Festivals

‘Squid Game’ And Other K-Culture Moments

Ancient Rock Art Site in Australia Under Threat After Government Extends Nearby Gas Mega-Plant

ISP Alumni Denounce Whitney’s Cancellation of Pro-Palestine Performance

Latest Posts

It takes at least five years to understand any industry, says Zhipu AI COO Zhang Fan · TechNode

June 3, 2025

Turning Speech Into Insights – Artificial Lawyer

June 3, 2025

C3 AI Awarded $13 Million Task Order to Expand Predictive Maintenance Program Across U.S. Air Force Fleet

June 3, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.