OpenAI Unveils 'Monitorability-Evals' for AI Model Monitorin

OpenAI has released an open-source evaluation suite named "monitorability-evals" under the Apache-2.0 license, aimed at assessing the effectiveness of monitoring AI models' chain-of-thought (CoT) processes. This suite, detailed in the paper "Monitoring Monitorability" by Guan et al., includes 13 evaluations across 24 environments, focusing on intervention, process, and outcome-property prototypes. Key findings indicate that monitoring CoT is more effective than solely observing final outputs, with longer CoTs enhancing monitorability. The evaluations reveal that reinforcement learning (RL) training does not significantly diminish monitorability, even at advanced scales. Practical insights suggest that smaller models with higher reasoning intensity can match larger models' capabilities while improving monitorability, albeit with increased reasoning compute. The suite has been integrated into the GPT-5.4 Thinking system card, showing slightly lower overall CoT monitorability compared to GPT-5 Thinking, with specific declines in areas like health queries and memory bias. OpenAI notes some regressions are due to limitations in the evaluation framework, leading to the deprecation of certain evaluations.