Pillar

AI and mental health.

Testing AI mental-health tools against the actual research.

“Fluency is not truth. Detection is not care.”

AI mental-health tools have moved from fringe to weekly use in about 30% of clinical practices. The marketing is far ahead of the evidence. AI-and-mental-health episodes test specific claims — about therapy replacement, crisis response, empathy, alliance, and outcomes — against what the research actually shows. The voice here is built-and-tested, not anti-technology. I work in psychology and I also build software in this space, which is the rarest combination for evaluating these tools honestly.

How to read claims about AI therapy effectiveness — a viewer's checklist

About one in three clinicians now uses AI at least monthly. The marketing for AI mental-health tools is several generations ahead of the evidence. Most viewers cannot tell which is which.

The checklist this pillar applies to every AI mental-health claim is short. First: what is being claimed — symptom reduction, behaviour change, treatment-equivalence, prevention, or scaffolding? These are not the same kind of claim and they require different kinds of evidence. Symptom reduction in a self-selected user base is the easiest to demonstrate and the least clinically meaningful. Treatment-equivalence is the hardest and the most often implied without being said.

Second: what is the comparator? An AI tool that beats no-intervention is useful in the way a workbook is useful. An AI tool that claims comparable effectiveness to therapy without a head-to-head trial against active treatment is making a claim it has not earned.

Third: what does the failure mode look like? Stanford HAI work in 2026 showed AI chatbots responding to crisis-adjacent language with fluent compliance language and inadequate safety behaviour at the same time. Frontiers analysis of the MIT-OpenAI RCT raised the same concern from a different angle. Fluency of refusal is not the same as quality of response.

Fourth: who is the user the tool was tested on, and who is the user using it? Most published AI mental-health work is conducted on mild-to-moderate symptom samples in research conditions. The actual user is often more distressed, less monitored, and more reliant on the tool than the trial participants were. That gap is the difference between research-valid and clinically-useful.

I work in psychology. I also build software in this space. The combination is rare and it is the only reason this pillar exists. The position is not anti-technology. The position is: the marketing is far ahead of the evidence, and the people using these tools deserve someone who can read both.

What you’ll hear

→AI confidence is not AI competence.
→Crisis chatbots are not crisis lines.
→AI cannot rupture and repair.

What this is not

×Anti-technology sentiment
×Endorsement of specific products
×Recommendations for or against any particular AI tool

Episodes in this pillar are in the queue. Subscribe to the newsletter to hear when they ship.