Technology

AI vs ER Doctors: Study Finds Higher Diagnostic Accuracy

AI diagnosis – A Misryoum review of a Harvard-led study reports AI models can match or outperform ER doctors in some diagnostic moments.

An AI model may have delivered more accurate emergency diagnoses than doctors in a tightly controlled test, according to a new study highlighted by Misryoum.

The research. conducted by a team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. evaluated how large language models perform across multiple medical scenarios. including cases modeled on real emergency room workflows.. In this context. the focus wasn’t on general health advice. but on how AI handles diagnostic decisions using the same kind of information clinicians see in the electronic record.

The study compared outputs from OpenAI models, including o1 and 4o, with diagnoses from attending physicians.. In one experiment. Misryoum says the team looked at 76 patients who came to the emergency department. then assessed whether the AI-generated diagnoses matched what attending physicians concluded.. Importantly. the review process used other attending physicians who were unaware of whether each diagnosis came from a human or from the model.

A key detail from the findings is that performance differences were most noticeable early in the process. during initial emergency triage. where clinicians face high pressure and limited patient context.. Misryoum reports that the o1 model showed especially strong results at that first touchpoint. including a higher rate of exact or very close diagnostic matches than at least one of the physician baselines.

This matters because early triage is where small errors can cascade. and it’s one of the hardest moments for any diagnostic system.. If AI can reliably help at that stage. the practical impact on emergency workflows could be significant. even while the industry continues to debate what “automation” should mean in healthcare.

The researchers also emphasized that they did not alter or pre-process the underlying data. presenting the models with text available in electronic medical records at the same diagnostic points used in the clinical comparisons.. Misryoum notes the team framed the results as performance evidence rather than proof that AI is ready to make life-or-death decisions independently.

In fact, the study explicitly called for prospective trials to test these technologies in real patient care settings.. Misryoum adds that the researchers also limited the evaluation to text-based inputs. while acknowledging that other modalities may require different capabilities. since foundation models can be less robust when reasoning depends on nontext information.

The takeaway for hospitals and regulators is clear: better diagnostic accuracy is not the same as a safe, accountable system.. Misryoum underscores that patients and clinicians still need clear human oversight. and that there is a need for formal accountability frameworks before AI-based diagnosis can be treated as more than decision support.