Google Launches Cost-Effective Gemini 3.1 Flash-Lite

Google has introduced Gemini 3.1 Flash-Lite, a new model in the Gemini 3 series, designed to be the fastest and most cost-effective option. Utilizing a Mixture of Experts (MoE) architecture, it reduces inference costs by activating only a subset of parameters. The API pricing is set at $0.25 per million input tokens and $1.50 per million output tokens, significantly cheaper than the Gemini 3.1 Pro. Gemini 3.1 Flash-Lite boasts improved performance metrics, with a 2.5 times reduction in first token latency and a 45% increase in output speed, reaching 363 tokens per second. It supports up to 1 million input tokens and 64,000 output tokens, handling text, images, audio, and video inputs. In internal benchmarks, it outperformed GPT-5 Mini and Claude 4.5 Haiku in six out of eleven tests, including GPQA Diamond and MMMU-Pro. The model's "thinking level" feature allows developers to adjust inference depth, optimizing quality and cost. Preview access is available through the Gemini API and Vertex AI.