Google Research has introduced TurboQuant, a novel quantization algorithm that compresses the KV cache of large language models to 3 bits, significantly reducing memory usage by at least 6x without compromising accuracy. This advancement allows for up to 8x faster attention computation on NVIDIA H100 GPUs in 4-bit mode compared to the traditional 32-bit baseline. TurboQuant was validated on benchmarks such as LongBench and ZeroSCROLLS, demonstrating optimal performance with models like Gemma and Mistral. The algorithm features two sub-algorithms: PolarQuant, which uses polar coordinate transformation to eliminate memory overhead, and QJL, which corrects residual errors with just 1 bit. The research, led by Amir Zandieh and Vahab Mirrokni, in collaboration with KAIST and NYU, will be presented at ICLR 2026. Google highlights its potential to alleviate KV cache bottlenecks in models like Gemini.