Does FLUX Already Know How To Perform Physically Plausible Image Composition? - Takara TLDR

Image composition aims to seamlessly insert a user-specified object into a
new scene, but existing models struggle with complex lighting (e.g., accurate
shadows, water reflections) and diverse, high-resolution inputs. Modern
text-to-image diffusion models (e.g., SD3.5, FLUX) already encode essential
physical and resolution priors, yet lack a framework to unleash them without
resorting to latent inversion, which often locks object poses into contextually
inappropriate orientations, or brittle attention surgery. We propose SHINE, a
training-free framework for Seamless, High-fidelity Insertion with Neutralized
Errors. SHINE introduces manifold-steered anchor loss, leveraging pretrained
customization adapters (e.g., IP-Adapter) to guide latents for faithful subject
representation while preserving background integrity. Degradation-suppression
guidance and adaptive background blending are proposed to further eliminate
low-quality outputs and visible seams. To address the lack of rigorous
benchmarks, we introduce ComplexCompo, featuring diverse resolutions and
challenging conditions such as low lighting, strong illumination, intricate
shadows, and reflective surfaces. Experiments on ComplexCompo and
DreamEditBench show state-of-the-art performance on standard metrics (e.g.,
DINOv2) and human-aligned scores (e.g., DreamSim, ImageReward, VisionReward).
Code and benchmark will be publicly available upon publication.

Source link

What's Hot

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets – Takara TLDR

Build multi-agent site reliability engineering assistants with Amazon Bedrock AgentCore

Hitachi builds distributed AI cloud with Nvidia to take Industry 4.0 to new level

Does FLUX Already Know How to Perform Physically Plausible Image Composition? – Takara TLDR

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets – Takara TLDR

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources – Takara TLDR

Interactive Recommendation Agent with Active User Commands – Takara TLDR

Lisa Phillips, Longtime Director of New York’s New Museum, to Retire

Submerged Port Discovery Offers Clues to Lost Tomb of Cleopatra

Forged Polish Painting Returns to the National Museum in Poznań

French Artist Invader Sues Julien Auctions Over Sale of Street Artworks

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets – Takara TLDR

Build multi-agent site reliability engineering assistants with Amazon Bedrock AgentCore

Hitachi builds distributed AI cloud with Nvidia to take Industry 4.0 to new level

What's Hot

Does FLUX Already Know How to Perform Physically Plausible Image Composition? – Takara TLDR

Related Posts

Subscribe to Updates