Paper page - IA-T2I: Internet-Augmented Text-to-Image Generation

An Internet-Augmented text-to-image generation framework improves uncertain text prompt handling by integrating reference images, enhancing image quality and fidelity.

Current text-to-image (T2I) generation models achieve promising results, but
they fail on the scenarios where the knowledge implied in the text prompt is
uncertain. For example, a T2I model released in February would struggle to
generate a suitable poster for a movie premiering in April, because the
character designs and styles are uncertain to the model. To solve this problem,
we propose an Internet-Augmented text-to-image generation (IA-T2I) framework to
compel T2I models clear about such uncertain knowledge by providing them with
reference images. Specifically, an active retrieval module is designed to
determine whether a reference image is needed based on the given text prompt; a
hierarchical image selection module is introduced to find the most suitable
image returned by an image search engine to enhance the T2I model; a
self-reflection mechanism is presented to continuously evaluate and refine the
generated image to ensure faithful alignment with the text prompt. To evaluate
the proposed framework’s performance, we collect a dataset named Img-Ref-T2I,
where text prompts include three types of uncertain knowledge: (1) known but
rare. (2) unknown. (3) ambiguous. Moreover, we carefully craft a complex prompt
to guide GPT-4o in making preference evaluation, which has been shown to have
an evaluation accuracy similar to that of human preference evaluation.
Experimental results demonstrate the effectiveness of our framework,
outperforming GPT-4o by about 30% in human evaluation.

Source link

What's Hot

Alibaba launches new Qwen LLMs in China’s latest open-source AI breakthrough – NBC4 Washington

OpenAI and Google outdo the mathletes, but not each other

How CrowdStrike’s 78-minute outage reshaped enterprise cybersecurity

Paper page – IA-T2I: Internet-Augmented Text-to-Image Generation

Paper page – RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services

Paper page – Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper page – The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Nonprofit Files Case Accusing Russia of Plundering Ukrainian Culture

Artist Raymond Saunders Dies at 90

Famous $6.2 M. Banana from Maurizio Cattelan’s ‘Comedian’ Eaten Again

Trump Accused of Sending Lewd Drawing, And More: Morning Links

Alibaba launches new Qwen LLMs in China’s latest open-source AI breakthrough – NBC4 Washington

OpenAI and Google outdo the mathletes, but not each other

How CrowdStrike’s 78-minute outage reshaped enterprise cybersecurity

What's Hot

Paper page – IA-T2I: Internet-Augmented Text-to-Image Generation

Related Posts

Subscribe to Updates