Fine-tuning a pre-trained Text-to-Image (T2I) model on a tailored portrait
dataset is the mainstream method for text-driven customization of portrait
attributes. Due to Semantic Pollution during fine-tuning, existing methods
struggle to maintain the original model’s behavior and achieve incremental
learning while customizing target attributes. To address this issue, we propose
SPF-Portrait, a pioneering work to purely understand customized semantics while
eliminating semantic pollution in text-driven portrait customization. In our
SPF-Portrait, we propose a dual-path pipeline that introduces the original
model as a reference for the conventional fine-tuning path. Through contrastive
learning, we ensure adaptation to target attributes and purposefully align
other unrelated attributes with the original portrait. We introduce a novel
Semantic-Aware Fine Control Map, which represents the precise response regions
of the target semantics, to spatially guide the alignment process between the
contrastive paths. This alignment process not only effectively preserves the
performance of the original model but also avoids over-alignment. Furthermore,
we propose a novel response enhancement mechanism to reinforce the performance
of target attributes, while mitigating representation discrepancy inherent in
direct cross-modal supervision. Extensive experiments demonstrate that
SPF-Portrait achieves state-of-the-art performance. Project webpage:
https://spf-portrait.github.io/SPF-Portrait/