Community test: GPT-5.5 achieves a 99.46% accuracy rate in multiplying 20-digit numbers without any tools

X user @cozyblaze265065 posted unofficial benchmark results for multi-digit multiplication on May 22: without using any external tools, GPT-5.5 was set to ‘medium reasoning’ mode with 7 samples per calculation, and it solved 400 multiplication problems (each involving two 20-digit numbers), achieving a 99.46% accuracy rate; errors occurred only in a very small number of cases involving extremely large numbers. The heatmap shows that the accuracy range under ‘medium reasoning’ far exceeds that of lower reasoning settings, indicating that increasing the number of chain-of-thought steps significantly improves accuracy in multi-digit arithmetic. AI researcher Raphaël Millière retweeted this post, commenting, ‘I still occasionally hear people claim LLMs can’t do arithmetic at all — this once again reminds us that we’re no longer in 2022.’ This test was conducted voluntarily by the community and isn’t an official OpenAI benchmark, but its clear methodology and noteworthy results have attracted widespread attention.

X (@cozyblaze265065)