Anthropic has unveiled BioMysteryBench, a new benchmark featuring 99 bioinformatics questions designed by experts using real-world datasets. The benchmark evaluates the capabilities of Claude, Anthropic's AI model, in solving complex bioinformatics problems. Of the 99 questions, 76 were solvable by human experts, while 23 remained unsolved by up to five domain experts. Claude Opus 4.6 achieved a 77.4% accuracy on human-solvable questions, with the Mythos Preview model improving further. Notably, Mythos Preview solved 30% of the human-difficult questions.
Claude's success is attributed to its ability to leverage cross-paper knowledge and employ multiple analytical methods simultaneously. However, reliability analysis shows a gap in consistency, with 86% of correct answers on human-solvable questions being stable across attempts, compared to only 44% on human-difficult questions. This highlights the model's capability boundaries. Concurrently, Genentech and Roche's CompBioBench corroborated these findings, with Claude Opus 4.6 achieving 81% accuracy overall and 69% on the hardest questions.
Anthropic's Claude Mythos Solves 30% of Complex Bioinformatics Challenges
면책 조항: Phemex 뉴스에서 제공하는 콘텐츠는 정보 제공 목적으로만 제공됩니다. 제3자 기사에서 출처를 얻은 정보의 품질, 정확성 또는 완전성을 보장하지 않습니다.이 페이지의 콘텐츠는 재무 또는 투자 조언이 아닙니다.투자 결정을 내리기 전에 반드시 스스로 조사하고 자격을 갖춘 재무 전문가와 상담하시기 바랍니다.
