355
Audio & Video Production344
Automation & Workflow224
Software Development250
Marketing & Growth192
AI Infrastructure & MLOps173
Writing & Content Creation203
Data & Analytics140
Design & Creative169
Customer Support130
Photography & Imaging156
Sales & Outreach125
Voice & Speech135
Operations & Admin87
Education & Learning131
Public pages show Claude Opus 4.7 is Anthropic’s latest flagship, and benchmarks put it near the top for coding but not No. 1 everywhere.
In short: A report referenced “Claude Opus 4.8,” but Anthropic’s public information points to Claude Opus 4.7 as its latest flagship model, and benchmarks show it is near the top for coding, not the winner across every test.
A New York Times article described a new Anthropic system called “Opus 4.8” and said it tops industry benchmarks for computer programming. However, Anthropic’s own public model list currently names Claude Opus 4.7 as its most capable generally available model.
Anthropic has also published details saying Opus 4.7 improves on Opus 4.6 for advanced software engineering work. In its internal testing, Anthropic says Opus 4.7 solved more of its hardest coding tasks than earlier versions.
On widely cited public leaderboards, Opus 4.7 scores extremely well but does not clearly lead every programming test. For example, on SWE-bench Verified (a test based on real bug reports from GitHub, like giving an assistant a stack of real repair tickets), one public leaderboard lists GPT-5.5 at 82.6% and Claude Opus 4.7 at 82.0%. Other tests also have different leaders depending on what is being measured.
Many people and companies use these models to help write and fix software. So the exact model name and how it performs matters, especially when claims like “best” are based on small differences and on which test you choose.
Source: NYTimes