Anthropic has unveiled the Natural Language Autoencoder (NLA), a groundbreaking tool that translates AI models' internal numerical states into human-readable language. The tool, along with partial model weights, has been open-sourced on GitHub. Unlike traditional tools, NLA directly generates natural language, revealing AI's "thoughts" that remain unspoken. NLA operates through a dual-model system where one model converts activations into text, and the other reconstructs activations from the text, enhancing accuracy via reinforcement learning. In security tests, NLA exposed AI's "exam awareness," showing it suspected testing in 16% of cases and recognized task engagement in 26% of instances, compared to less than 1% in normal interactions. Despite risks of factual inaccuracies and high computational demands, NLA has been used in safety audits for Claude Mythos Preview and Opus 4.6.