Anthropic's Claude Mythos Tackles Bioinformatics Challenges

Anthropic has unveiled BioMysteryBench, a new benchmark featuring 99 bioinformatics questions designed by experts using real-world datasets. The benchmark evaluates the capabilities of Claude, Anthropic's AI model, in solving complex bioinformatics problems. Of the 99 questions, 76 were solvable by human experts, while 23 remained unsolved by up to five domain experts. Claude Opus 4.6 achieved a 77.4% accuracy on human-solvable questions, with the Mythos Preview model improving further. Notably, Mythos Preview solved 30% of the human-difficult questions. Claude's success is attributed to its ability to leverage cross-paper knowledge and employ multiple analytical methods simultaneously. However, reliability analysis shows a gap in consistency, with 86% of correct answers on human-solvable questions being stable across attempts, compared to only 44% on human-difficult questions. This highlights the model's capability boundaries. Concurrently, Genentech and Roche's CompBioBench corroborated these findings, with Claude Opus 4.6 achieving 81% accuracy overall and 69% on the hardest questions.

함께 보면 좋은 콘텐츠