Aliyun Introduces Implicit Caching for Qwen3.7-Max, Cutting Input Costs by 80%

Alibaba's Qwen team has launched automatic implicit caching for its Qwen3.7-Max model on Alibaba Cloud's Bailian platform, significantly reducing input costs by up to 80%. This new feature allows developers to benefit from cost savings without altering code or adding parameters. The system identifies repeated context prefixes in requests, charging only 20% of the standard rate for matched input tokens. The implicit caching is particularly beneficial for scenarios involving long texts and Agent tasks, where Qwen3.7-Max frequently processes large codebases or documents. This move comes amid competitive pricing pressures, notably from DeepSeek V4-Pro, which recently slashed its cache-hit billing to $0.003625 per million tokens. In response, Qwen3.7-Max also offers an explicit caching mode, providing even lower costs but requiring manual configuration.

Source: Show Original

Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.