Stability AI, in partnership with chip designer Arm, announced on May 14, 2025, the open-source release of Stable Audio Open Small, a compact and efficient text-to-audio artificial intelligence model. This stereo AI model is specifically optimized to run entirely on Arm CPUs, enabling generative audio capabilities directly on devices like smartphones without relying on cloud processing. The release is significant as it aims to democratize audio creation for a wider range of users and applications, while notably addressing intellectual property concerns by being trained exclusively on royalty-free audio.
The new model, detailed in Stability AI’s official announcement, features 341 million parameters and can produce up to 11 seconds of audio on a smartphone in under eight seconds. This performance builds on a previously announced breakthrough with Arm at Mobile World Congress 2025, where optimizations using Arm KleidiAI libraries dramatically reduced generation times.
Prem Akkaraju, CEO of Stability AI, highlighted this earlier achievement, stating “Thanks to these model optimizations and Arm KleidiAI, we moved from minutes to mere seconds to generate audio entirely on the Arm CPU on the smartphone.” The current Stable Audio Open Small leverages these advancements, making it accessible without heavy hardware requirements, as Stability AI News notes.
The company’s commitment to an ethical approach is underscored by its use of the Free Music Archive and Freesound for training, a contrast to some competitors like Suno, who have faced scrutiny over using copyrighted content.
Developers can access Stable Audio Open Small under the permissive Stability AI Community License, with model weights available on Hugging Face, code on GitHub, and its research paper published on arXiv. An Arm Learning Path is also available to guide developers.
On-Device Audio: Capabilities and Considerations
Stable Audio Open Small’s on-device processing offers speed and offline functionality, a key differentiator from many cloud-dependent audio generation services. While primarily designed for short audio samples like sound effects or musical riffs, Stability AI acknowledges certain limitations. The model currently supports only English prompts and is not yet optimized for generating highly realistic vocals or complex, full-length songs.
Furthermore, as indicated in its documentation and reported by TechCrunch, the training data has a Western bias, potentially impacting its performance across diverse global music styles. The licensing terms are structured to encourage broad adoption: it’s free for researchers, hobbyists, and businesses earning less than $1 million annually. However, larger entities exceeding this revenue will need an enterprise license from Stability AI.
Navigating the AI Audio Landscape and IP Challenges
Stability AI’s decision to train this model exclusively on royalty-free audio sources is a strategic move in an industry increasingly focused on intellectual property rights. This contrasts with other AI audio tools, some of which have faced legal action from record labels for allegedly using copyrighted music without proper authorization. By using openly licensed data, Stability AI aims to provide a more legally sound foundation for creators.
The broader AI audio field is dynamic, with companies like ElevenLabs having launched tools for sound effects in June 2024, emphasizing ethically sourced data through partnerships. NVIDIA also presented its advanced Fugatto audio model in November 2024, though it has not been publicly released due to concerns about potential misuse, reflecting a cautious approach to powerful generative technologies. More recently, Google introduced its Lyria text-to-music AI model in April, primarily for its enterprise customers, though, as WinBuzzer noted, details about its training datasets were not specified.
Stability AI’s Evolution in Generative Audio
The release of Stable Audio Open Small represents an ongoing evolution of Stability AI’s work in the audio domain, following the initial launch of its Stable Audio platform in September 2023. That earlier iteration, developed with data from AudioSparx, focused on cloud-based generation. This new “Small” version, however, clearly prioritizes efficiency and on-device deployment, aligning with the industry trend towards edge AI.
This launch comes as Stability AI, known for its popular image generator Stable Diffusion, continues to navigate a competitive market. The company has experienced financial restructuring and leadership changes, having raised new cash last year.
The introduction of an ethically trained, on-device model like Stable Audio Open Small, other recent image generation model releases, signals a strategic effort to innovate and solidify its market position. The combination of accessibility, on-device performance, and a royalty-free data foundation could make Stable Audio Open Small an attractive option for developers and creators.