StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

MarkTechPost·Michal Sutter

9h ago

·~1 min·5/24/2026·en·0

Quick Take

StepFun launched StepAudio 2.5 Realtime, a customizable real-time speech model excelling in benchmarks.

Key Points

Supports real-time speech in Chinese and English.
Achieved top scores in five benchmark dimensions.
Features roleplay-specific RLHF and paralinguistic comprehension.

Article Excerpt

From source RSS / original summary

StepFun, the Shanghai-based AI lab, released StepAudio 2. 5 Realtime in May 2026 — an end-to-end real-time speech large language model with fully customizable persona capabilities. The model connects via a WebSocket API, supports Chinese and English, and ranked first across all five benchmark dimensions tested in April 2026, including an 80. 41 human evaluation score and 82. 18 on paralinguistic comprehension. The post StepFun Releases StepAudio 2.

5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension appeared first on MarkTechPost.

Reader Mode unavailable (could not extract clean content).

Read on marktechpost.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from MarkTechPost

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%