330
Audio & Video Production325
Software Development239
Automation & Workflow212
Writing & Content Creation192
Marketing & Growth181
AI Infrastructure & MLOps160
Design & Creative162
Photography & Imaging150
Data & Analytics125
Voice & Speech127
Education & Learning121
Customer Support119
Sales & Outreach118
Research & Analysis93
A review found general-purpose AI models made serious mistakes reading stroke CT scans, while specialized medical AI can be much more accurate in narrow tasks.
In short: General-purpose AI chatbots can sound confident while making frequent mistakes in medical scan reading, and hospitals are relying more on specialized AI tools instead.
Testing of several well-known large language models, including GPT-5 and Claude Opus, found a roughly 20% fundamental error rate when they analyzed the same CT brain scans for stroke. A CT scan is a series of X-ray pictures that helps doctors see inside the body. The models often gave conflicting explanations, such as disagreeing about where the stroke was or when it happened, even when looking at identical images.
This matters because these chatbots are designed to write and talk smoothly, not to act like a specialist doctor. They can come across like a confident colleague, but in this setting they can be wrong in basic ways. In the tests, the models also did a poor job judging each other’s answers, which makes it harder to catch mistakes.
At the same time, the story highlights that more focused medical AI tools can work very well when they are built for one job. For example, some AI systems have matched or beaten groups of radiologists on narrow tasks like classifying lung nodules on CT scans. Other systems can cut reading time sharply in specific workflows, and help hospitals flag urgent scans sooner.
Doctors quoted in the discussion argue that general chatbots should be used more like a helper for writing notes, summarizing, or explaining results in plain language, not as a standalone diagnostic tool. Watch for more studies that test AI in real hospitals, not just clean lab settings, and for clearer rules about responsibility when an AI-supported decision goes wrong.
Source: NYTimes