Browsing: Hugging Face
We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0…
Recent advancements in 2D and multimodal models have achieved remarkable success by leveraging large-scale training on extensive datasets. However, extending…
Multimodal generative models that can understand and generate across multiple modalities are dominated by autoregressive (AR) approaches, which process tokens…
Open-vocabulary semantic segmentation models associate vision and text to label pixels from an undefined set of classes using textual queries,…
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such…
Text-guided image editing aims to modify specific regions of an image according to natural language instructions while maintaining the general…