Shanghai-based AI lab StepFun has outperformed major tech competitors with its StepAudio 2.5 Realtime model, which excelled in all five major voice AI benchmarks from April 2026. The model surpassed GPT Realtime 1.5 and Gemini Live, demonstrating superior capabilities in understanding tone, emotion, and speech rate. Key scores include 80.41 in human evaluation, 86.36 in general dialogue performance, and 84.80 in automotive scenario testing.
StepAudio 2.5 Realtime's architecture integrates Automatic Speech Recognition, Text-to-Speech, and real-time dialogue processing into a unified system, reducing latency and enhancing nuance. The model employs persona-specific Reinforcement Learning from Human Feedback, allowing it to maintain consistent character traits. It supports both Chinese and English and is accessible via StepFun's platform API. The model's paralinguistic comprehension score of 82.18 highlights its ability to detect emotional cues, offering significant advancements in voice assistant technology.
StepFun's StepAudio 2.5 Realtime Dominates April 2026 Voice AI Benchmarks
免責事項: Phemexニュースで提供されるコンテンツは、あくまで情報提供を目的としたものであり、第三者の記事から取得した情報の正確性・完全性・信頼性について保証するものではありません。本コンテンツは金融または投資の助言を目的としたものではなく、投資に関する最終判断はご自身での調査と、信頼できる専門家への相談を踏まえて行ってください。
