DeepSeek has launched its V4 API models, V4-Pro and V4-Flash, introducing significant price reductions and an eightfold increase in context length. The V4-Flash model replaces the previous V3.2 version without a price hike, offering cached input at 0.2 RMB per million tokens and reducing uncached input costs from 2 RMB to 1 RMB, while output costs drop from 3 RMB to 2 RMB. The context length has expanded from 128K to 1M tokens.
The V4-Pro model, a new premium tier, is priced at 1 RMB for cached input, 12 RMB for uncached input, and 24 RMB for output per million tokens, reflecting a higher cost due to limited high-end compute capacity. However, prices are expected to decrease following the release of Ascend 950 super nodes later this year. Both models support non-reasoning and reasoning modes, with the latter offering high and max intensity levels. The legacy models, deepseek-chat and deepseek-reasoner, will be discontinued by July 24, 2026.
DeepSeek V4 API Unveils V4-Pro and V4-Flash with Price Reductions and Expanded Context
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
