Anthropic has unveiled BioMysteryBench, a new benchmark featuring 99 bioinformatics questions designed by experts using real-world datasets. The benchmark evaluates the capabilities of Claude, Anthropic's AI model, in solving complex bioinformatics problems. Of the 99 questions, 76 were solvable by human experts, while 23 remained unsolved by up to five domain experts. Claude Opus 4.6 achieved a 77.4% accuracy on human-solvable questions, with the Mythos Preview model improving further. Notably, Mythos Preview solved 30% of the human-difficult questions.
Claude's success is attributed to its ability to leverage cross-paper knowledge and employ multiple analytical methods simultaneously. However, reliability analysis shows a gap in consistency, with 86% of correct answers on human-solvable questions being stable across attempts, compared to only 44% on human-difficult questions. This highlights the model's capability boundaries. Concurrently, Genentech and Roche's CompBioBench corroborated these findings, with Claude Opus 4.6 achieving 81% accuracy overall and 69% on the hardest questions.
Anthropic's Claude Mythos Solves 30% of Complex Bioinformatics Challenges
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
