Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Generation and Editing
Quick Take
Stability AI has launched Stable Audio 3, a series of latent diffusion models for audio generation, featuring small and medium variants with open weights. The medium variant achieves a FAD score of 0.369 on the BBC Sound Effects benchmark, outperforming all evaluated open-weight baselines, and is compatible with consumer GPUs with 8 GB of VRAM.
Key Points
- Stable Audio 3 includes small and medium variants with open weights.
- The small variant runs on MacBook Pro M4 CPU.
- Medium variant generates stereo audio at 44.1 kHz.
- SA3 medium scores FAD 0.369 on BBC Sound Effects benchmark.
- Both models utilize a three-stage training pipeline.
Article Excerpt
From source RSS / original summaryStability AI has released Stable Audio 3, a family of latent diffusion models for instrumental music and sound effects generation. The release includes open weights for the small and medium variants. Small runs on a MacBook Pro M4 CPU. Medium fits on consumer GPUs with 8 GB of VRAM. Both generate stereo audio at 44. 1 kHz using a three-stage training pipeline: flow matching, distillation warmup, and adversarial post-training. On the BBC Sound Effects benchmark at 5 seconds, SA3 medium scores FAD 0.
369 — lower than every open-weight baseline evaluated in the paper. The post Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Generation and Editing appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →
Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate
Perplexity AI has released a rewritten Unigram tokenizer that significantly reduces reranker latency by achieving 5-6x lower p50 latency compared to Hugging Face's tokenizers. This advancement also leads to a substantial decrease in production CPU utilization, benefiting developers and companies relying on efficient tokenization in their AI applications.
