An NHS Intern Found AI Hallucinations in Cancer Diagnoses—and Went Public

The familiar rhythm of public healthcare fills the waiting area of a Surrey hospital on a gloomy morning. Appointment letters are clutched by patients. Nurses move swiftly through the hallways. Dozens of scans, including CT, mammography, and X-rays, are being processed somewhere behind the radiology department doors. Each scan silently goes through an algorithm before arriving at a doctor’s desk.

That routine now includes artificial intelligence. The NHS is in dire need of speed, which AI tools promise to deliver. Radiologists are overworked as a result of the millions of scans that the British health service processes annually. In certain hospitals, physicians examine thousands of pictures every year in an effort to find tumors that are as tiny as a fingernail. There is fatigue. Distractions occur. AI has been embraced in that setting as a virtual second pair of eyes.

Category	Details
Organization	National Health Service (NHS) England
Sector	Public Healthcare
Headquarters	London, United Kingdom
AI Use Case	Radiology and cancer screening tools
Example AI System	AI-assisted mammogram and X-ray analysis
Major Trials	NHS AI breast cancer screening trial (700,000 scans)
Key Issue	Accuracy, AI hallucinations, and medical oversight
Estimated Workforce	1.3 million staff
Official Website	https://www.nhs.uk

However, one intern began to notice something strange late one evening while seated behind a hospital computer.

Still in the early stages of medical school, the intern had been assisting in the evaluation of results from a pilot AI diagnostic tool that was being used to identify questionable patterns in imaging scans. The program was already well-known for identifying small cancers that sometimes go undetected by medical professionals. Some patients even attribute their survival to these tools’ ability to detect tumors early.

However, the intern continued to notice odd entries in the reports from the system. The AI seemed to confidently describe anomalies that didn’t exist in a number of instances. Occasionally, the software identified tissue patterns that radiologists were unable to observe. At times, it even cited seemingly fabricated diagnostic reasoning or medical literature. Not exactly maliciously wrong. but oddly certain about details that weren’t supported by the scans themselves.

Hallucination is the term used by AI researchers to describe this phenomenon.

It occurs when a model of artificial intelligence produces information that sounds plausible but isn’t backed up by evidence. False statistics or made-up citations may be used in hallucinations in text-based AI systems. The repercussions seem more severe in medical imaging systems. A patient may be led down a terrifying diagnostic path by a hallucinated tumor. Or worse, divert attention away from a genuine issue.

According to reports, the intern initially thought it was a straightforward technical issue. Particularly in pilot programs, medical software can be disorganized. However, the pattern persisted, dispersed throughout various scans and medical records. subtle discrepancies. Strangely thorough explanations of tissue alterations that were impossible for human reviewers to confirm.

The intern started recording instances and contrasting them with the results of radiologists. A tiny file of anomalies accumulated over time.

As this develops, it seems as though contemporary hospitals are subtly turning into technology labs. Long before physicians examine scans, AI systems analyze them, rank urgent cases, and identify questionable trends. These systems have occasionally detected early-stage cancers that humans would have missed in NHS trials.

An AI system that analyzed mammograms in one well-known instance detected a tumor that was only six millimeters wide, so tiny that doctors had initially missed it. Later on, the patient claimed that the technology might have saved her life. Such moments explain why AI diagnostics are being rapidly adopted by healthcare systems worldwide.

However, outside of the lab, technology seldom operates flawlessly. Reactions reportedly differed when the intern finally voiced concerns within the company. Some medical professionals were intrigued and even appreciative of the scrutiny. Some were dubious. Because AI systems are trained on massive datasets, inconsistencies are common. Whether the algorithm made mistakes or not was not the question; rather, it was whether those errors presented actual clinical risk.

Perhaps that discussion would have stayed in hospital meetings. The intern, however, made the decision to go public.

Researchers and journalists started discussing the findings, which led to a wider discussion about the dependability of AI in healthcare. Suddenly, the promise of quicker diagnoses wasn’t the only topic of discussion. It concerned supervision, openness, and the peculiar assurance with which machine learning algorithms occasionally provide ambiguous responses.

It seems like medicine is venturing into uncharted territory. Tools such as microscopes, MRI scanners, and laboratory tests have always been used by doctors. However, those tools hardly ever generate interpretations by themselves. AI programs do. They produce probabilities, recommendations, and explanations. When they do this, they occasionally even sound remarkably convincing. Part of the issue is that persuasiveness.

A human reviewing a diagnosis may be subtly influenced by a machine’s suggestion. Radiologists are familiar with this effect. Doctors may focus more on a suspicious area if software highlights it, sometimes ignoring other hints. Attention is shaped by technology.

In medicine, attention is crucial.

This does not imply that hospitals should stop using AI. Not many medical professionals contest that. Digital assistance is nearly inevitable due to the demands of the workload alone. In order to determine whether AI can safely serve as a “second reader” for cancer screening, the NHS has already started extensive trials involving hundreds of thousands of mammograms.

However, those experiments make the crucial assumption that people stay firmly in the loop. Despite its seeming insignificance, the intern’s discovery serves as a reminder that algorithms are constantly learning. They are not perfect diagnosticians, but rather strong pattern-recognition systems. Sometimes, even the most sophisticated models yield confident responses that are based on shaky foundations.

Today, as one passes the radiology department, the machines are still quietly processing images, identifying suspicious shapes, and directing medical professionals toward potential diagnoses. It’s amazing technology. Astonishing at times.

However, it’s difficult to avoid picturing the intern looking at those odd reports late at night and spotting something that others had missed. Medical advancements are frequently the result of breakthroughs.