Alibaba's Tongyi Lab has launched VimRAG, a next-generation multimodal RAG framework, on April 10. VimRAG addresses the "state blind spot" issue in existing systems by upgrading linear historical records to a Multimodal Memory Graph. This framework uses a dynamic directed acyclic graph (DAG) to eliminate redundant retrieval and track exploration paths in real time. It features Graph-Modulated Visual Memory Encoding for adaptive token allocation in high-load visual data and employs the GGPO mechanism for precise credit assignment, enhancing reasoning attribution accuracy. VimRAG has demonstrated exceptional performance across benchmarks such as SlideVQA, MMLongBench, and LVBench, with its Qwen3-VL-8B-Instruct version achieving top scores. This framework aims to transition multimodal RAG from simple retrieval to structured, reliable reasoning, offering robust solutions for complex documents and multimodal scenarios.