355
Audio & Video Production344
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps174
Writing & Content Creation203
Data & Analytics140
Design & Creative169
Customer Support131
Photography & Imaging156
Sales & Outreach125
Voice & Speech135
Operations & Admin87
Education & Learning131
AI can now solve some Olympiad problems and assist on narrow research questions, but it still makes basic proof mistakes. This may change math training.
In short: AI is getting good at some very hard math problems, so it is starting to overlap with work often done by top students, but it still cannot reliably check its own proofs.
Some AI systems can now solve math problems at the level of major competitions. DeepMind said its AlphaProof and AlphaGeometry 2 systems solved 4 of 6 problems on the 2024 International Mathematical Olympiad, scoring like a silver medalist.
Researchers are also building larger test sets to see how steady that performance really is. MIT’s MathNet collection has over 30,000 proof-based Olympiad-style problems. On a 6,400-problem benchmark from MathNet, GPT-5 scored about 69.3%, which means it still missed nearly 1 in 3 problems. Results got worse when problems used diagrams, which is like asking the AI to read a picture, not just text.
AI is also beginning to help with narrow research tasks. A Caltech-led team used an AI method designed for very long chains of reasoning to solve two families of problems linked to the Andrews–Curtis conjecture. It did not solve the main conjecture, but it closed subproblems that had been open for 25 to 44 years.
At the same time, recent tests on fresh USAMO problems found that several top models scored under 5%. Reviewers said a common failure is that models produce confident-looking proofs with gaps, instead of noticing they are wrong.
For young mathematicians, this likely raises the bar for what counts as “impressive” in routine problem solving. Schools, contests, and hiring committees may put more weight on choosing good questions, explaining ideas clearly, and checking work carefully, especially because AI can sound sure even when it is mistaken.
Source: NYTimes