AI can spot diagnoses—but doctors manage the fallout

AI can – Two new studies suggest AI models are becoming extremely accurate at diagnosing complex medical cases. But the hard part—choosing tests, treatments, and follow-up—still depends on the patient in front of a clinician, their preferences, and the uncertainties th
A father watching his toddler run a fever for two days and pull at one ear reaches for his phone. A 65-year-old woman, winded on her morning walks and more fatigued than usual, does the same. In both moments, an AI chatbot turns symptoms into something that sounds like medical judgment.
“Your child likely has an ear infection,” the father learns. “Your symptoms could indicate a cardiac condition,” the woman reads.
Those answers can be helpful—and they may even be right. But diagnosis is only one step in medicine. The next step is deciding what to do with that information, including how to act when the body refuses to behave like a clean textbook.
An April 2026 study found OpenAI’s o1 model achieved a 78% accuracy rate on complex diagnostic cases published in The New England Journal of Medicine. The study also reported that the model outperformed experienced doctors when diagnosing actual emergency room patients. A separate 2024 study found that ChatGPT. working on its own. outperformed physicians in diagnosing complex cases—even when those physicians were able to use ChatGPT themselves.
Yet getting a correct label is only “half a doctor’s job.” The other half is management: how to run tests, what treatments to try, what to monitor, and when to follow up.
For clear-cut problems. management can be straightforward—“a little numbing cream for a baby’s gums. ” for example. or an appointment with a cardiologist. But uncertainty is common in clinical practice. Knowing what ails a patient is often necessary, but not sufficient, for deciding how to care for them.
The difference starts with how clinicians think.
Doctors don’t assess every patient from scratch. Over years of practice. they build illness scripts—mental shortcuts that capture what a disease typically looks like. who tends to get it. and how it most often progresses. When a patient’s symptoms match a familiar pattern. doctors match what they observe against those scripts. categorizing and recognizing patterns. That speed matters: it helps clinicians spot what doesn’t fit. A symptom that seems off. a detail in the patient’s history—like “a recent trip abroad” or “an unusual exposure at work”—can redirect the diagnosis.
AI’s strength in diagnosis is tied to the same kind of pattern-matching. Large language models like ChatGPT predict the next word based on patterns learned from enormous amounts of text. including medical literature. In that literature. the word “pneumonia” often follows symptom patterns such as fever paired with a cloudy patch on a chest X-ray. At this level, the mechanics resemble the process of fitting symptoms to illness scripts.
But deciding what to do next is different. A doctor often confronts not one correct pathway, but several reasonable options.
In a prostate-cancer example. two men—Marcus and Tomás. both 68—have the same biopsy result: an early-stage. slow-growing tumor confined to the prostate. Both are offered the same two management options: treat now with surgery or radiation. accepting the risks of urinary incontinence and changes to sexual function; or monitor closely with regular tests and biopsies. treating only if the cancer grows.
A study tracking more than 82,000 men with early-stage prostate cancer for 15 years found fewer than 3 in 100 died of their prostate cancer regardless of which path they chose. Men who chose monitoring, however, were about twice as likely to see their cancer spread.
An AI system can lay out those choices and statistics. What it can’t do is sit across from a specific person and weigh what matters to them.
Marcus has no other significant health conditions. His doctor knows this, and knows that uncertainty “sits badly with him.” When the cancer is present but watched rather than treated, Marcus cannot live with the waiting; he chooses treatment.
Tomás has advanced heart failure, something his doctor has been managing for years. In Tomás’s case. his doctor knows that his heart condition poses a more immediate threat than this slow-growing tumor does. She also knows what happened to a friend who went through radiation and came out “diminished.” Treating aggressively would mean accepting real costs for a benefit that may never arrive. She recommends active surveillance, and Tomás experiences it as the relief of having the right priorities.
Different management decisions are the norm in medicine. The “right path” depends on who the patient is and what they value, along with a clinician’s judgment about where the evidence is solid and where genuine uncertainty remains.
That’s why management starts with evidence—but ends with the bedside.
To decide how to manage a patient’s condition. a doctor first considers evidence from the medical literature and then applies management options to the patient’s particular circumstances. The process requires honest communication and shared decision-making, with clinicians and patients jointly navigating risk and acknowledging uncertainty.
Some risk can be measured. For chest pain. doctors use scoring tools that estimate a patient’s short-term likelihood of a heart attack based on symptoms and test results. and AI could likely work through those numbers faster than most doctors. But at the bedside, risk and uncertainty are harder to reduce to a single score. Scoring systems and practice guidelines are built for the “average patient”—an idealized person who doesn’t exist in real rooms with real anxieties.
Risk perception also comes from experience. For many patients, that includes a long and justified history of mistrust in the healthcare system. AI “does not know what you have been through” and cannot acknowledge uncertainty in the way a good doctor can—returning to it with a patient as circumstances change.
That is where diagnosis and management part ways. The feverish toddler may be the kind of case where the father got a useful answer because AI has seen enough feverish toddlers in the medical literature to make a reasonable call. But what comes next—when to stop watching. when to start worrying—needs a conversation that’s tailored. not just predicted.
Andrew Parsons is an associate professor of medicine at the University of Virginia.
This article is republished from The Conversation under a Creative Commons license.
AI diagnosis ChatGPT OpenAI o1 medical management clinical decision-making patient preferences uncertainty in medicine prostate cancer monitoring emergency room diagnostics
So it’s basically Google but for illnesses? Cool cool.
I don’t trust a chatbot telling me what’s wrong with me. Like yeah maybe it’s 78% but what about the other 22%? Also doctors already barely have time lol.
Wait, so the AI is diagnosing ear infections and cardiac stuff from symptoms, but then it says doctors manage the fallout… so basically it’s making a guess and then people pay the doctor to fix the guess? Doesn’t feel like a win.
Idk man, I saw someone say AI diagnoses are why they’re gonna cut healthcare jobs. Then this article is like “but clinicians still decide tests and treatment” which sounds nice but also like the AI is still gonna be used to route you anyway. If the body doesn’t follow the textbook… doesn’t that mean AI will be confident in the wrong thing more often? Seems scary for real, especially with kids ear stuff.