Paper Page - ZeroSep: Separate Anything In Audio With Zero Training

Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep learning approaches, while powerful, are limited by the need for extensive, task-specific labeled data and struggle to generalize to the immense variability and open-set nature of real-world acoustic scenes. Inspired by the success of generative foundation models, we investigate whether pre-trained text-guided audio diffusion models can overcome these limitations. We make a surprising discovery: zero-shot source separation can be achieved purely through a pre-trained text-guided audio diffusion model under the right configuration. Our method, named ZeroSep, works by inverting the mixed audio into the diffusion model’s latent space and then using
text conditioning to guide the denoising process to recover individual sources. Without any task-specific training or fine-tuning, ZeroSep repurposes the generative diffusion model for a discriminative separation task and inherently supports openset scenarios through its rich textual priors. ZeroSep is compatible with a variety of pre-trained text-guided audio diffusion backbones and delivers strong separation performance on multiple separation benchmarks, surpassing even supervised methods. Our project page is here: https://wikichao.github.io/ZeroSep/.

Source link

What's Hot

Perplexity reportedly raised $200M at $20B valuation

DeepSeek-R1 More Effective in Diagnosis, Management of Ophthalmic Subspecialties Compared With OpenAI

OpenAI and Oracle strike $300B cloud computing deal to power AI

Paper page – ZeroSep: Separate Anything in Audio with Zero Training

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning – Takara TLDR

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search – Takara TLDR

Visual Representation Alignment for Multimodal Large Language Models – Takara TLDR

Christie’s Will Auction The First Calculating Machine In History

Ralph Rugoff to Leave London’s Hayward Gallery After 20 Years

New York Foundation for the Arts Workers Move to Unionize

Patrizia Sandretto Re Rebaudengo Teams Up with New Museum

Perplexity reportedly raised $200M at $20B valuation

DeepSeek-R1 More Effective in Diagnosis, Management of Ophthalmic Subspecialties Compared With OpenAI

OpenAI and Oracle strike $300B cloud computing deal to power AI

What's Hot

Paper page – ZeroSep: Separate Anything in Audio with Zero Training

Related Posts

Subscribe to Updates