Anthropic has unveiled BioMysteryBench, a new benchmark featuring 99 bioinformatics questions designed by experts using real-world datasets. The benchmark evaluates the capabilities of Claude, Anthropic's AI model, in solving complex bioinformatics problems. Of the 99 questions, 76 were solvable by human experts, while 23 remained unsolved by up to five domain experts. Claude Opus 4.6 achieved a 77.4% accuracy on human-solvable questions, with the Mythos Preview model improving further. Notably, Mythos Preview solved 30% of the human-difficult questions.
Claude's success is attributed to its ability to leverage cross-paper knowledge and employ multiple analytical methods simultaneously. However, reliability analysis shows a gap in consistency, with 86% of correct answers on human-solvable questions being stable across attempts, compared to only 44% on human-difficult questions. This highlights the model's capability boundaries. Concurrently, Genentech and Roche's CompBioBench corroborated these findings, with Claude Opus 4.6 achieving 81% accuracy overall and 69% on the hardest questions.
Anthropic's Claude Mythos Solves 30% of Complex Bioinformatics Challenges
Aviso legal: El contenido de Phemex News es únicamente informativo.No garantizamos la calidad, precisión ni integridad de la información procedente de artículos de terceros.El contenido de esta página no constituye asesoramiento financiero ni de inversión.Le recomendamos encarecidamente que realice su propia investigación y consulte con un asesor financiero cualificado antes de tomar cualquier decisión de inversión.
