Google has introduced Gemini 3.1 Flash-Lite, a new model in the Gemini 3 series, designed to be the fastest and most cost-effective option. Utilizing a Mixture of Experts (MoE) architecture, it reduces inference costs by activating only a subset of parameters. The API pricing is set at $0.25 per million input tokens and $1.50 per million output tokens, significantly cheaper than the Gemini 3.1 Pro.
Gemini 3.1 Flash-Lite boasts improved performance metrics, with a 2.5 times reduction in first token latency and a 45% increase in output speed, reaching 363 tokens per second. It supports up to 1 million input tokens and 64,000 output tokens, handling text, images, audio, and video inputs. In internal benchmarks, it outperformed GPT-5 Mini and Claude 4.5 Haiku in six out of eleven tests, including GPQA Diamond and MMMU-Pro. The model's "thinking level" feature allows developers to adjust inference depth, optimizing quality and cost. Preview access is available through the Gemini API and Vertex AI.
Google Unveils Gemini 3.1 Flash-Lite, Cutting Costs and Outperforming Rivals
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
