New tests with unpublished math problems suggest today’s AI is good at familiar tasks but often fails at original, multi-step research problems.
In short: Recent tests suggest today’s leading AI models can handle familiar math exercises but often fail when mathematicians give them brand new research problems.
Mathematicians have been testing large language models, or LLMs (AI systems that predict the next word, like a very advanced autocomplete). To avoid the AI copying something it has seen online, researchers used unpublished problems from their own work.
In these tests, the models often did poorly on a first attempt. They could solve many contest-style or textbook questions, but they struggled with problems that require exploration, careful logic, and making new connections. A February 2026 report described this as a lack of “intuition,” meaning the AI does not reliably find the right path when there is no familiar pattern to follow.
Other research summaries make a similar point. Some models score well on standardized math benchmarks, including parts of graduate-level algebra. But that does not translate into solving open research questions, where the steps are not obvious and the answer is not a known template.
There has been progress in narrow areas. For example, one benchmark cited better accuracy on basic conversions. Still, models often break down on multi-step problems because they are guessing the most likely next step instead of calculating in a strict, checkable way, like a calculator.
Researchers are trying workarounds, such as pairing AI with external tools for arithmetic and formal proofs, and using “hybrid” systems that combine pattern spotting with rule-based checking (like drafting an essay, then having an accountant verify the numbers). For now, the evidence suggests AI is more useful as an assistant that helps humans explore ideas, not as a replacement for mathematicians.
Source: Arstechnica
12
Software Development18
Data & Analytics6
Audio & Video Production8
Productivity & Workflow11
Voice & Speech5
Sales & Outreach5
Design & Creative5
Marketing & Growth4
Search & Discovery7
Email & Communication5
Art & Illustration3
Customer Support1
HR & Recruiting2
Writing & Content Creation3