'HunyuanWorld-Voyager' Can Generate Videos In Which The Viewpoint Moves Within A 3D Scene Generated From A Single Image

Sep 04, 2025 19:00:00

Tencent, a major Chinese IT company, has released ‘ HunyuanWorld-Voyager, ‘ an AI framework that generates coherent 3D scenes from a single image, on GitHub. HunyuanWorld-Voyager achieves scene augmentation while preserving context, and can generate videos of moving viewpoints within the generated 3D scene.

GitHub – Tencent-Hunyuan/HunyuanWorld-Voyager: Voyager is an interactive RGBD video generation model conditioned on camera trajectory, and supports real-time 3D reconstruction.

https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager

HunyuanWorld-Voyager is a 3D scene generation AI framework trained on a dataset of over 100,000 video clips, combining real-world captured images with synthetically rendered images in Unreal Engine, using a reconstruction pipeline that automates camera pose estimation and metric depth prediction for any video.

HunyuanWorld-Voyager consists of two main components:

1: A unified architecture that generates RGB and depth-aligned video sequences based on input images, ensuring consistency.
2: Autoregressive inference with smooth video sampling for efficient world caching and point removal, as well as iterative scene augmentation with context-aware consistency.

These components enable HunyuanWorld-Voyager to generate a coherent 3D scene from a single image, generate video of the scene as the camera moves, and reconstruct a 3D point cloud from the generated 3D scene.

On GitHub, the actual images input to HunyuanWorld-Voyager and the video generated based on them are publicly available. Below is the image input to HunyuanWorld-Voyager, and the image on the bottom right shows the camera movement within the 3D scene. The camera movement can be specified by the user.

The generated video is below:

Camera movement in the 3D scene generated by ‘HunyuanWorld-Voyager’ 01 – YouTube

Next, enter the following image:

The generated video looks like this:

Camera movement in the 3D scene generated by ‘HunyuanWorld-Voyager’ 02 – YouTube

Also, below is a 3D point cloud reconstructed from the video generated by HunyuanWorld-Voyager. Although it is rough, you can see that the 3D point cloud has been reconstructed.

3D point cloud reconstructed from video generated by ‘HunyuanWorld-Voyager’ – YouTube

Source link

What's Hot

Tencent has open-sourced the 7 billion parameter lightweight translation models ‘Hunyuan-MT-7B’ and ‘Hunyuan-MT-Chimera-7B,’ which can translate between 33 languages, and claims that they beat existing models in benchmarks.

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent – Takara TLDR

Datavault AI (DVLT) Is Up 141.5% After $23M IBM Deal and Governance Changes Has the Bull Case Changed?

‘HunyuanWorld-Voyager’ can generate videos in which the viewpoint moves within a 3D scene generated from a single image

Tencent has open-sourced the 7 billion parameter lightweight translation models ‘Hunyuan-MT-7B’ and ‘Hunyuan-MT-Chimera-7B,’ which can translate between 33 languages, and claims that they beat existing models in benchmarks.

Accuracy Increased by 3 Times, Goodbye to Abstract Faces_the_model_times

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

Judge Rejects Ronald Perelman’s $400 M. Art Insurance Claim

Drag Queen Alexis Stone Became the Mona Lisa for Milan Fashion Show

Steve McQueen’s Granddaughter Lawsuit for $68 M. Pollock Painting

Marina Abramović to Have Exhibition at Venice’s Accademia in 2026

Tencent has open-sourced the 7 billion parameter lightweight translation models ‘Hunyuan-MT-7B’ and ‘Hunyuan-MT-Chimera-7B,’ which can translate between 33 languages, and claims that they beat existing models in benchmarks.

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent – Takara TLDR

Datavault AI (DVLT) Is Up 141.5% After $23M IBM Deal and Governance Changes Has the Bull Case Changed?

What's Hot

‘HunyuanWorld-Voyager’ can generate videos in which the viewpoint moves within a 3D scene generated from a single image

Related Posts

Subscribe to Updates