Should you trust AI with your numbers?
Picture a budget meeting at a city agency or a corporate boardroom. A generative AI system drafts a cash-flow projection, summarizes a long policy memo, and suggests a pricing tweak or benefits change. The prose looks polished, charts look clean, and a single percentage point buried in the model quietly pushes the decision in a different direction. The team moves on, unaware that the math underneath is wrong.
The team behind Omni Calculator built the ORCA Benchmark to test this risk and found that no leading model scored above 63 percent on real-world calculation tasks. As governments and companies embed Gen AI deeper into workflow, that gap matters more every quarter.
ORCA shows that models still miss a large share of everyday quantitative questions, often because of basic arithmetic and rounding errors. Its benchmark report notes that even state-of-the-art tools can describe the right formula in words while misapplying it step by step. One slightly wrong rate in a budget, procurement forecast, or loan estimate easily moves millions of dollars in the........
