Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Generation and Editing

5/26/2026

·~1 min·5/26/2026·en·3

Quick Answer

Stability AI has launched Stable Audio 3, a series of latent diffusion models for audio generation, featuring small and medium variants with open weights.

Quick Take

Stability AI has launched Stable Audio 3, a series of latent diffusion models for audio generation, featuring small and medium variants with open weights. The medium variant achieves a FAD score of 0.369 on the BBC Sound Effects benchmark, outperforming all evaluated open-weight baselines, and is compatible with consumer GPUs with 8 GB of VRAM.

Key Points

Stable Audio 3 includes small and medium variants with open weights.
The small variant runs on MacBook Pro M4 CPU.
Medium variant generates stereo audio at 44.1 kHz.
SA3 medium scores FAD 0.369 on BBC Sound Effects benchmark.
Both models utilize a three-stage training pipeline.

Article Excerpt

From source RSS / original summary

Stability AI has released Stable Audio 3, a family of latent diffusion models for instrumental music and sound effects generation. The release includes open weights for the small and medium variants. Small runs on a MacBook Pro M4 CPU. Medium fits on consumer GPUs with 8 GB of VRAM. Both generate stereo audio at 44. 1 kHz using a three-stage training pipeline: flow matching, distillation warmup, and adversarial post-training. On the BBC Sound Effects benchmark at 5 seconds, SA3 medium scores FAD 0.

369 — lower than every open-weight baseline evaluated in the paper. The post Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Generation and Editing appeared first on MarkTechPost.

Read on marktechpost.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from MarkTechPost

See more →

MarkTechPost·Asif Razzaq

4w ago

FeaturedOriginal

Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs

AI Summary

Flash-KMeans is an open-source, IO-aware k-means implementation that operates over 200× faster than FAISS on NVIDIA H200 GPUs. It achieves 17.9× end-to-end and 33× speedup over cuML by optimizing distance calculations and updating mechanisms without approximating results. This advancement significantly enhances performance for data scientists and machine learning practitioners.

#AI Coding #GPU #Open Source