Open-source tool KVCache.ai visually shows how much memory large language models use for KV caching; DeepSeek V4 Flash requires just 2.9GB for processing one million tokens.

KVCache.ai has launched a browser-based, open-source KV cache size calculator that supports popular models such as DeepSeek V4 Flash, Qwen3, GLM, Kimi, and MiniMax. It calculates required GPU memory in real time based on context length, precision (FP16/INT4, etc.), and batch size. Since its release, the tool has garnered significant attention within the community; key comparison data proves quite striking: at a context length of 1 million tokens, DeepSeek V4 Flash requires roughly 2.893 GiB of total KV cache storage, whereas MiniMax needs about 236 GiB under identical conditions — a difference of nearly 82 times. This disparity stems from DeepSeek’s Multi-Head Latent Attention (MLA) architecture, which compresses key-value pairs into low-dimensional latent variables to drastically cut storage overhead rather than simply reducing attention heads. This architectural logic also explains why DeepSeek can offer the lowest industry pricing for cache hits via its API. User @teortaxesTex on X noted that “finally someone made this tool; it clearly shows why DeepSeek’s caching pricing is so competitive.” Other users have used the data to conclude that launching a model supporting up to 10 million tokens this year would still make economic sense for DeepSeek.

X (@teortaxesTex) | KVCache.ai