Tests find general AI chatbots miss strokes on CT scans 20% of the time

In short: General-purpose AI chatbots can sound confident while making frequent mistakes in medical scan reading, and hospitals are relying more on specialized AI tools instead.

What's going on

Testing of several well-known large language models, including GPT-5 and Claude Opus, found a roughly 20% fundamental error rate when they analyzed the same CT brain scans for stroke. A CT scan is a series of X-ray pictures that helps doctors see inside the body. The models often gave conflicting explanations, such as disagreeing about where the stroke was or when it happened, even when looking at identical images.

This matters because these chatbots are designed to write and talk smoothly, not to act like a specialist doctor. They can come across like a confident colleague, but in this setting they can be wrong in basic ways. In the tests, the models also did a poor job judging each other’s answers, which makes it harder to catch mistakes.

At the same time, the story highlights that more focused medical AI tools can work very well when they are built for one job. For example, some AI systems have matched or beaten groups of radiologists on narrow tasks like classifying lung nodules on CT scans. Other systems can cut reading time sharply in specific workflows, and help hospitals flag urgent scans sooner.

What to watch

Doctors quoted in the discussion argue that general chatbots should be used more like a helper for writing notes, summarizing, or explaining results in plain language, not as a standalone diagnostic tool. Watch for more studies that test AI in real hospitals, not just clean lab settings, and for clearer rules about responsibility when an AI-supported decision goes wrong.

Source: NYTimes

Tests find general AI chatbots miss strokes on CT scans 20% of the time

Jack Harrison

What's going on

What to watch

Similar News

Journal retracts study claiming ChatGPT improves student learning

Harvard study finds AI matched or beat ER doctors on diagnoses

Study finds warmer chatbots make more factual mistakes

AI use is rising fast, but the benefits may not reach everyone equally

Researchers use AI to redesign part of a ribosome without isoleucine

Explore AI Directory