Google has unveiled a new technique called Multi-Token Prediction (MTP) that significantly accelerates AI inference speeds by up to three times without requiring new hardware. This advancement, part of Google's Gemma 4 model family, utilizes speculative decoding to enhance processing efficiency. By integrating a smaller, fast "predictor" model with the main AI model, MTP allows multiple tokens to be predicted simultaneously, reducing the time needed for generating sequences. The approach maintains the quality of large models, such as the 31-billion parameter Gemma 4, by validating predictions in a single forward pass. Google's benchmarks show that enabling MTP on a Gemma 4 26B chip with an Nvidia RTX Pro 6000 GPU nearly doubles token processing speed, while Apple Silicon chips see a 2.2x speedup. This development promises to improve responsiveness in applications requiring low latency, such as real-time chat and voice interfaces, using existing consumer hardware.