
Tencent, a major Chinese IT company, has released ‘ HunyuanWorld-Voyager, ‘ an AI framework that generates coherent 3D scenes from a single image, on GitHub. HunyuanWorld-Voyager achieves scene augmentation while preserving context, and can generate videos of moving viewpoints within the generated 3D scene.
GitHub – Tencent-Hunyuan/HunyuanWorld-Voyager: Voyager is an interactive RGBD video generation model conditioned on camera trajectory, and supports real-time 3D reconstruction.
https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager
HunyuanWorld-Voyager is a 3D scene generation AI framework trained on a dataset of over 100,000 video clips, combining real-world captured images with synthetically rendered images in Unreal Engine, using a reconstruction pipeline that automates camera pose estimation and metric depth prediction for any video.
HunyuanWorld-Voyager consists of two main components:
1: A unified architecture that generates RGB and depth-aligned video sequences based on input images, ensuring consistency.
2: Autoregressive inference with smooth video sampling for efficient world caching and point removal, as well as iterative scene augmentation with context-aware consistency.
These components enable HunyuanWorld-Voyager to generate a coherent 3D scene from a single image, generate video of the scene as the camera moves, and reconstruct a 3D point cloud from the generated 3D scene.
On GitHub, the actual images input to HunyuanWorld-Voyager and the video generated based on them are publicly available. Below is the image input to HunyuanWorld-Voyager, and the image on the bottom right shows the camera movement within the 3D scene. The camera movement can be specified by the user.
The generated video is below:
Camera movement in the 3D scene generated by ‘HunyuanWorld-Voyager’ 01 – YouTube
Next, enter the following image:
The generated video looks like this:
Camera movement in the 3D scene generated by ‘HunyuanWorld-Voyager’ 02 – YouTube
Also, below is a 3D point cloud reconstructed from the video generated by HunyuanWorld-Voyager. Although it is rough, you can see that the 3D point cloud has been reconstructed.
3D point cloud reconstructed from video generated by ‘HunyuanWorld-Voyager’ – YouTube