Grok 4.20 Beta Scores 97% on τ²-Bench, Secures Second Place

Grok 4.20 Beta has achieved a 97% accuracy rate on the τ²-Bench evaluation, securing the second position. The τ²-Bench, an extension of the original τ-bench framework from Sierra, is renowned for its rigorous testing standards. This benchmark evaluates AI capabilities in answering questions and completing navigation tasks, highlighting Grok 4.20 Beta's advanced performance in these areas.

Source: Show Original

Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.