Sierra has open-sourced μ-Bench, a multilingual automatic speech recognition (ASR) benchmark dataset, to enhance evaluation in customer service environments. The dataset includes 250 real customer service recordings and 4,270 annotated audio clips, addressing the limitations of existing English-focused benchmarks. μ-Bench supports five languages—English, Spanish, Turkish, Vietnamese, and Mandarin—and features performance results from vendors like Google and Microsoft. The benchmark introduces the Utterance Error Rate (UER) metric, which differentiates between errors that impact meaning and those that do not, offering a more nuanced evaluation than traditional Word Error Rate (WER). Google Chirp-3 leads in accuracy, while Deepgram Nova-3 excels in speed but lags in multilingual accuracy. The dataset and leaderboard are available on Hugging Face, inviting further vendor participation.