DeepSeek's V4 model card has confirmed three core components of its architecture, following the open-sourcing of the TileKernels library. The model employs Manifold-Constrained Hyper-Connections (mHC), a mixture of experts (MoE) architecture with Top-k expert routing, and FP4+FP8 mixed precision for weight storage. These elements were accurately inferred from the TileKernels library.
However, the model card does not mention the Engram conditional memory module, which was previously speculated but remains unconfirmed. Additionally, the card introduces new features not covered by TileKernels, including a hybrid attention mechanism (CSA + HCA) that significantly enhances long-context efficiency, reducing inference FLOPs to 27% and KV cache to 10% of V3.2's under 1M context. The training process now utilizes the Muon optimizer.
DeepSeek V4 Model Card Confirms Key Components, Omits Engram
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
