Shanghai-based AI lab StepFun has outperformed major tech competitors with its StepAudio 2.5 Realtime model, which excelled in all five major voice AI benchmarks from April 2026. The model surpassed GPT Realtime 1.5 and Gemini Live, demonstrating superior capabilities in understanding tone, emotion, and speech rate. Key scores include 80.41 in human evaluation, 86.36 in general dialogue performance, and 84.80 in automotive scenario testing.
StepAudio 2.5 Realtime's architecture integrates Automatic Speech Recognition, Text-to-Speech, and real-time dialogue processing into a unified system, reducing latency and enhancing nuance. The model employs persona-specific Reinforcement Learning from Human Feedback, allowing it to maintain consistent character traits. It supports both Chinese and English and is accessible via StepFun's platform API. The model's paralinguistic comprehension score of 82.18 highlights its ability to detect emotional cues, offering significant advancements in voice assistant technology.
StepFun's StepAudio 2.5 Realtime Dominates April 2026 Voice AI Benchmarks
Haftungsausschluss: Die auf Phemex News bereitgestellten Inhalte dienen nur zu Informationszwecken.Wir garantieren nicht die Qualität, Genauigkeit oder Vollständigkeit der Informationen aus Drittquellen.Die Inhalte auf dieser Seite stellen keine Finanz- oder Anlageberatung dar.Wir empfehlen dringend, eigene Recherchen durchzuführen und einen qualifizierten Finanzberater zu konsultieren, bevor Sie Anlageentscheidungen treffen.
