Google has introduced a new tiered pricing strategy for its Gemini API, offering five distinct service levels: Standard, Flexible, Priority, Batch, and Cache. The Flexible and Batch tiers provide a 50% discount on standard rates, catering to applications with low latency sensitivity and large-scale data processing needs, respectively. The Cache tier is designed for high-frequency, complex instruction calls, with billing based on token count and storage duration. The Priority tier, priced 75% to 100% higher than the standard rate, ensures rapid response times from milliseconds to seconds, making it suitable for critical applications like customer service bots and real-time fraud detection. This new pricing model aims to optimize resource allocation for AI inference services, accommodating varying latency and cost requirements.