
Due to US export restrictions, it is difficult for China to obtain key resources such as HBM. In this regard, China intends to help the Chinese technology market no longer rely on HBM. According to market news, in order to develop a new software tool called Unified Cache Manager (simplified UCM), it can accelerate training and reasoning of large language models (LLMs) without using HBM.
Hua published UCM tools on Tuesday (12th) in the 2025 Financial AI Inference Application Implementation and Development Forum. The software allocates AI data between HBM, standard DRAM and SSD based on the delay characteristics of different memory types and the delay requirements of various AI applications.
Hua Feng, deputy president of data storage products, pointed out that AI reasoning currently faces three major problems: "unstop pushing" (the input content is too long and beyond the processing scope), "slow pushing" (the response speed is too slow), and "promoting" (the calculation cost is too high).
UCM maximizes the available resources speeds based on the system through the Key-Value cache architecture of "layered memory".
Introduction, UCM released this time is used as a reasoning acceleration suite centered on KV Cache, integrating multiple cache acceleration algorithm tools, classifying the KV cache memory data generated during the inference process, expanding the inference context window, realizing high throughput and low delay inference experience, and reducing the inference cost per token.
UCM has conducted a long period of actual application tests before publication. Zhou Feng pointed out that UCM was tried in the China Silver Association, and its application scenarios include "customer voice analysis, marketing planning and public assistance." Based on data, combined with traditional cache and storage technology, using UCM in AI inference tasks can reduce delays by up to 90% and increase throughput to 22 times.
Hua officially opened UCM in September, and will be first released in MagicQing Community. It will gradually contribute to the mainstream reasoning engine community in the industry, and share it with all Share Everything (sharing architecture) storage manufacturers and ecological partners in the industry.
As the current HBM chips are almost entirely produced by SK Hynix, Samsung and Micron, the United States has been committed to preventing China from purchasing or making HBMs, and the emergence of UCM is expected to dispel the US and accelerate the independence of China's chip technology.
Extended reading: Intel demonstrated the "USAI" special show of Love in China, emphasizing deepening US manufacturing and national defense cooperation Intel, IBM, and Google have suspended thousands of job vacancies, and AI may reshape the dynamic landscape within five years