KVCache.ai has launched a browser-based, open-source KV cache size calculator that supports popular models such as DeepSeek V4 Flash, Qwen3, GLM, Kimi, and MiniMax. It calculates required GPU memory in real time based on context length, precision (FP16/INT4, etc.), and batch size. Since its release, the tool has garnered significant attention within the community; key comparison data proves quite striking: at a context length of 1 million tokens, DeepSeek V4 Flash requires roughly 2.893 GiB of total KV cache storage, whereas MiniMax needs about 236 GiB under identical conditions — a difference of nearly 82 times. This disparity stems from DeepSeek’s Multi-Head Latent Attention (MLA) architecture, which compresses key-value pairs into low-dimensional latent variables to drastically cut storage overhead rather than simply reducing attention heads. This architectural logic also explains why DeepSeek can offer the lowest industry pricing for cache hits via its API. User @teortaxesTex on X noted that “finally someone made this tool; it clearly shows why DeepSeek’s caching pricing is so competitive.” Other users have used the data to conclude that launching a model supporting up to 10 million tokens this year would still make economic sense for DeepSeek.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| DeepSeek-V4-Pro 限时折扣 5 月 31 日到期,官方宣布原价四分之一将成永久定价 | 0 | 4 | May 22, 2026 | |
| DeepSeek API 更新限速文档,V4 Pro 并发上限 500、Flash 上限 2500 | 0 | 8 | May 21, 2026 | |
| 阿里巴巴发布闭源模型 Qwen3.7-Max,加大强化学习算力投入 | 0 | 9 | May 21, 2026 | |
| MiniMax M3 发布:MSA架构实现1M超长上下文,Coding与多模态能力进入国际前沿 | 0 | 3 | June 1, 2026 | |
| 六大模型非英语分词效率横评:Anthropic 税负最高,印地语用户消耗 token 是英文的逾 3 倍 | 0 | 1 | June 5, 2026 |