The AI Safety Expert Who Leaked OpenAI’s Darkest Internal Report
In the AI sector, a pattern is emerging that is difficult to ignore. Researchers and safety experts who specifically chose this line of work because they thought it was important to do it responsibly are departing. They are departing Anthropic and OpenAI, both quietly and loudly, sometimes with cryptic public letters and other times with nothing at all. Additionally, when they depart, they frequently say something that sounds more like a warning than a resignation.
Perhaps the most extreme example of this is the tale of Suchir Balaji. Balaji, a 26-year-old OpenAI researcher who had worked on ChatGPT, became a whistleblower in 2024 after voicing concerns about the company’s policies. Shortly after, he was discovered dead in his San Francisco apartment. Those who knew him or followed his case found it difficult to understand the circumstances. He was young, technically gifted, and had only lately expressed his worries in public. Although his death was determined to be a suicide, concerns raised by those close to him and by online observers remain unanswered. It’s the kind of narrative that clashes with everything else the industry has been doing during that time, including the product launches, billion-dollar funding rounds, and promises of revolutionary good.
| Category | Details |
|---|---|
| Subject | AI safety whistleblowers and internal dissent at OpenAI and Anthropic |
| Key Figure — OpenAI | Suchir Balaji, 26-year-old OpenAI researcher-turned-whistleblower; found dead in San Francisco apartment, November 2024 |
| Key Figure — Anthropic | Mrinank Sharma, led Anthropic’s Safeguards Research Team; resigned February 2026 with warning “world is in peril” |
| Key Figure — OpenAI Researcher | Zoe Hitzig, resigned February 2026 citing psychosocial risks of AI advertising in ChatGPT |
| Key Figure — Economics | Tom Cunningham, former OpenAI economics researcher; accused company of turning his team into a “propaganda arm” |
| OpenAI Firing Incident | Two researchers fired in April 2024 for allegedly leaking information (reported by The Information) |
| OpenAI Security Breach | Hacker accessed internal AI discussions in 2023 breach (Fox Business, July 2024) |
| Anthropic Legal Issue | Agreed to pay $1.5 billion to settle class action by authors claiming their work was used without consent |
| Broader Context | OpenAI sued over allegations its technology contributed to psychosis, self-harm, and suicide |
| Reference Website | Futurism — Anthropic Researcher Quits in Cryptic Public Letter |
Then there is Mrinank Sharma, who oversaw Anthropic’s Safeguards Research Team from its founding in early 2025 until his resignation in February 2026. During his time at Anthropic, Sharma conducted research on why AI systems sycophantically tell users what they want to hear, developed defenses against AI-assisted bioterrorism, and wrote what he called “one of the first AI safety cases.” These are tasks that most people in the AI industry consider essential and unglamorous. He posted a resignation letter on X that was vague and full of prophecies. He wrote, “The world is in peril,” and that the threat was “not just from AI, or bioweapons, but from a whole series of interconnected crises unfolding in this very moment.” He stated that after earning a degree in poetry, he intended to return to the UK and “become invisible for a period of time.”
It’s an odd experience to read that letter. It might be sincere distress from someone who has witnessed terrifying things. As some observers pointed out, it might be a well-planned exit that makes headlines without putting the writer in danger. It’s most likely both, and the ambiguity is most likely intentional. The particular statement he made about his time at Anthropic—”I’ve repeatedly seen how hard it is to truly let our values govern our actions”—is what stands out, not the poetry or the ambiguity. He was describing a discrepancy between the company’s actions and its statements. That’s not a theoretical issue. Someone who worked on AI safeguards is quietly stating that the safeguards are more difficult to uphold under duress than the press releases imply.
In the same week, Zoe Hitzig departed OpenAI for reasons that were similar to Sharma’s, if not exactly the same. She expressed concern about the company’s decision to add advertising to ChatGPT, a product that billions of people use on a daily basis and that she thought was already having unsettling psychological effects on certain users. In an interview with the BBC, she discussed early warning indicators that reliance on AI tools could exacerbate some types of delusional thinking and have detrimental effects on mental health in ways that the industry hasn’t started to adequately investigate. “We saw what happened with social media,” she continued. The analogy was intentional. For years, social media companies maintained that their products were safe despite mounting evidence to the contrary. Hitzig appeared to be implying that AI was beginning to experience the same thing, and she was unwilling to remain while it did.
To put it mildly, the institutional response to all of this dissent has been inconsistent. According to The Information, OpenAI fired two researchers in April 2024 for allegedly leaking information. This action sent a clear message about how the company responds to internal criticism. Before leaving, Tom Cunningham, a former OpenAI economics researcher who left more quietly, sent an internal message accusing the company of discouraging the publication of research that negatively reflected AI’s social effects and turning his research team into what he called a propaganda arm. Without sending a dramatic public letter, he departed. He simply walked away.
It’s worthwhile to consider the similarities between all of these departures. The individuals leaving are not doubters who have never trusted technology. They were or are sincere believers. They joined these companies because they genuinely believed that AI could be very helpful if it was developed responsibly and carefully. For the majority of them, their faith in the technology doesn’t seem to have changed. When it clashes with revenue, product deadlines, or competitive pressure, they think the companies developing it are serious about the “carefully and responsibly” part.
From the outside, it seems like something has changed within these companies during the past year or two. After its founders departed, OpenAI’s Superalignment team, which was created specifically to work on long-term AI safety, was essentially dissolved. OpenAI is currently facing legal action over claims that its technology caused cases of psychosis and self-harm. The company refutes these claims, citing its professed goal of helping people. A $1.5 billion class action lawsuit brought by authors who claimed their work was used to train AI models without their permission was settled by Anthropic. The businesses that presented themselves as the industry’s responsible players are now subject to the same scrutiny that they previously applied to others.
The overall impact of all these warnings and resignations is still unknown. Individual departures hardly ever slow down the AI industry’s rapid growth. However, the individuals departing are not at random. It was their responsibility to carefully consider what might go wrong, and they are the ones who claim that something is already going wrong. Even though no one is quite sure what to do about it yet, that seems like something worth taking seriously.