Anthropic has unveiled BioMysteryBench, a new benchmark featuring 99 bioinformatics questions designed by experts using real-world datasets. The benchmark evaluates the capabilities of Claude, Anthropic's AI model, in solving complex bioinformatics problems. Of the 99 questions, 76 were solvable by human experts, while 23 remained unsolved by up to five domain experts. Claude Opus 4.6 achieved a 77.4% accuracy on human-solvable questions, with the Mythos Preview model improving further. Notably, Mythos Preview solved 30% of the human-difficult questions.
Claude's success is attributed to its ability to leverage cross-paper knowledge and employ multiple analytical methods simultaneously. However, reliability analysis shows a gap in consistency, with 86% of correct answers on human-solvable questions being stable across attempts, compared to only 44% on human-difficult questions. This highlights the model's capability boundaries. Concurrently, Genentech and Roche's CompBioBench corroborated these findings, with Claude Opus 4.6 achieving 81% accuracy overall and 69% on the hardest questions.
Anthropic's Claude Mythos Solves 30% of Complex Bioinformatics Challenges
免責事項: Phemexニュースで提供されるコンテンツは、あくまで情報提供を目的としたものであり、第三者の記事から取得した情報の正確性・完全性・信頼性について保証するものではありません。本コンテンツは金融または投資の助言を目的としたものではなく、投資に関する最終判断はご自身での調査と、信頼できる専門家への相談を踏まえて行ってください。
