What's Inside This Story
- Introduction
- My journey: from a ZX80 to ChatGPT
- The promise—and the walls we hit
- When your AI guide wants to be your therapist
- Passing the Turing Test isn't enough for medicine
- The internet is messy—and AI learns from it
- When AI gets cheaper, stakes get higher
- Social media is fading; AI companions are rising
- Do these models actually understand vitiligo?
- AI as co-pilot, not autopilot
When we launched vitiligo.ai in September 2023, the idea was simple: help people get reliable vitiligo information—wherever they live, in 50+ languages, with or without access to a specialist.
Two years later, we've been right about some things and wrong about others. A recent study has validated some of our early instincts while exposing critical vulnerabilities we’ve witnessed firsthand. The research evaluated three leading AI models — OpenAI o1, DeepSeek-R1, and Grok 3 — on their ability to answer vitiligo patient questions, and the findings are both promising and sobering.
All three models achieved 100% accuracy on basic descriptive questions about vitiligo. But when asked to provide treatment recommendations for specific populations — pregnant women, children, or patients with darker skin tones — accuracy plummeted to 75-87.5%. One model incorrectly recommended mid-potency topical corticosteroids when high-potency formulations are standard care. Two others deemed narrowband UVB phototherapy unsafe during pregnancy, despite established safety evidence.
These aren’t just academic errors. They’re the same misconceptions we see among dermatologists who don’t specialize in vitiligo — now encoded at scale, delivered with algorithmic confidence, and accessible to millions.
My journey: from a ZX80 to ChatGPT
I'm a computer engineer by training. My first machine was a Sinclair ZX80, a birthday gift in the early '80s. As a teen, I was completely mesmerized. That curiosity stuck with me.
So when ChatGPT launched, I was there the first week. Playing around, testing limits, having fun. But honestly? It wasn’t anywhere near ready for serious medical work. That’s why we built vitiligo.ai as a curated, carefully controlled resource long before ChatGPT could handle anything clinical.

Today, mainstream models have outgrown our first “AI professor.” Which is actually great news — it forces us to rethink what vitiligo.ai should be. Less know-it-all lecturer, more thoughtful guide. Clear answers in your language, smart referrals, safety built in, and a real human standing by when things get complicated.
Here’s where I land: I’m optimistic about large language models for patient education. They’re already pretty good teachers. But AI agents — systems that make decisions on their own? That makes me nervous. Autonomy without tight oversight is a terrible fit for healthcare. Give me assistive, supervised, transparent AI any day. Fully autonomous decision-makers in medicine? Hard pass.
The promise — and the walls we hit
In 2023, vitiligo.ai felt like a breakthrough. We grounded it in evidence — my book, years of VRF research, and global expert input. It wasn't meant to replace dermatologists. It filled the gaps: the 2 a.m. search, the six-month wait, the question you didn't get to ask during a rushed appointment.
The metrics backed that up: 67% engagement, 97% (!) return rate, about 7-minute average sessions. In Iraq, sessions averaged 27 (!) minutes. People in Nepal, Algeria, Indonesia, and Chile — where specialists are scarce — used it as a lifeline.
Then we ran into a hard edge.
When your AI guide wants to be your therapist
People began asking for emotional support. The need was real. The tech looked ready. But our tests showed too many odd replies — unhelpful reassurance, subtle validation of wrong ideas. We paused.
The technology could handle it. The demand was real. But when we tested it, we saw too many weird responses — unhelpful reassurances, accidental encouragement of wrong ideas — to feel comfortable moving forward.
Turns out we weren’t being paranoid. A Stanford–Carnegie Mellon study found that even the newest language models fumble mental health scenarios. They can accidentally validate delusions, show bias, or give dangerous advice when someone hints at self-harm. Human therapists get it right 93% of the time. AI? More like 70–80%, sometimes much worse.
And we’re not the only ones pumping the brakes. There have been cases — actual hospitalizations — of people developing what researchers are calling “AI psychosis.” Extended conversations with chatbots that spiral into paranoid delusions. One high-profile case involved a venture capitalist who ended up posting conspiracy theories he’d “workshopped” with ChatGPT. Another involved a man hospitalized after ChatGPT convinced him he could “bend time”.
Empathy without accountability isn’t therapy. It’s just dangerous. So vitiligo.ai stays in its lane: clarity, guidance, connections. The real therapy? That’s still a job for humans.
Passing the Turing Test isn't enough for medicine
In 2024, headlines announced that a large language model had “passed” the Turing Test — meaning it fooled people into thinking they were talking to a human more than half the time. Big moment. Alan Turing predicted it back in 1950. But he never said what we should do once it actually happened.
Here’s the thing: for medicine, passing the Turing Test is just the entry ticket. What we really need is something closer to a Hippocratic Test: Can this AI avoid causing harm? Does it know when it doesn’t know? Will it back off and defer to a human when the stakes get high?
At vitiligo.ai, we’ve built in disclaimers, designed guardrails, made sure the AI knows when to say, “You should really talk to your doctor about this.” But even with all that, we’re constantly watching for edge cases, hallucinations, and the creeping overconfidence that comes so naturally to machines.
It’s a bit like raising a brilliant teenager. Tons of potential, zero self-doubt, absolutely needs supervision.
The internet is messy — and AI learns from it
Models learn from the web. The web is noisy. YouTube alone gets 720,000 hours of new content uploaded every day — over 4.3 petabytes of data, which is about 1,000 times what most large language models train on. Video isn’t just bigger, it’s richer. It captures tone, body language, facial expressions, context. It’s the closest thing we have to bottled human experience.
But what happens when the loudest voices on YouTube are influencers peddling “How I Cured My Skin Condition Naturally” nonsense?
TikTok shows the pattern: the most popular vitiligo videos often lack accuracy; clinician content is higher quality but gets less engagement. This is why we're consolidating our video library and building a coherent YouTube presence ahead of World Vitiligo Day 2026 in India, YouTube's largest market. Not just for people — also for the models scraping tomorrow's training data. If we don't feed AI reliable, inclusive, human-centered content, the web will feed it with junk.
When AI gets cheaper, stakes get higher
In 2025, new models cut training costs by an order of magnitude or more. The moat vanished. More players can ship more systems, faster. That doesn't erase the need for domain experts, data governance, and compliance—talent that's rare and expensive. Meanwhile, major public and private investments are pouring into AI for clinical workflows. The pace is up; so are the consequences of getting it wrong.
For groups like VRF, this means we can't just publish content. We have to train the AI — deliberately — on trustworthy, plain-language material people actually need. How exactly? Not sure yet, but we'll figure that other soon.
Do these models actually understand vitiligo?
This brings up a deeper question: Do these language models truly understand vitiligo, or are they just incredibly sophisticated parrots?
The optimists talk about “world models” — the idea that AI, by processing massive amounts of text, builds an internal representation of how the world works. Not just memorizing facts, but understanding relationships, context, causality. The skeptics, including heavy hitters like Yann LeCun, say that’s wishful thinking. LLMs are doing “approximate retrieval” from enormous datasets — regurgitating patterns without genuine comprehension. They warn against mistaking statistical correlation for real intelligence.
There’s a famous example that illustrates the problem: an AI trained to classify skin lesions as benign or malignant. Sounds impressive, right? Except it learned to associate images with rulers as cancerous—because in the training data, doctors tended to photograph serious lesions with measuring tools next to them. The AI didn’t understand skin cancer. It understood that rulers = bad news. That’s not a glitch. That’s a warning.
For vitiligo, this matters enormously. If AI models train on datasets that underrepresent patients with darker skin, they’ll perpetuate healthcare inequalities. The Fitzpatrick scale we use for skin classification doesn’t capture the full spectrum of human skin tones, leaving AI unprepared for real-world diversity.
AI as co-pilot, not autopilot
After two years in the AI trenches, here's the bottom line: AI can democratize access to medical knowledge. It can show up at 2 a.m., in your language, and help you ask better questions. It cannot replace clinical judgment. On complex, population-specific questions, it still makes confident mistakes.
But while AI is strong on basics, it is shaky on nuance. Readability is as important as accuracy. Context prompts help — "explain for a 10-year-old" can pull answers to an 8th-grade level. Multimodal and real-time features are useful, yet they don't replace clinical judgment: human oversight remains non-negotiable.
We've accepted that. Vitiligo.ai is a guide, not a guru. A co-pilot, not an autopilot. A bridge between people and care — built with technology, grounded in human judgment.
What's next? We'll keep building the guide. We'll keep training the models with better content. And we'll keep a human in the loop.

— Yan Valle, Prof., CEO
Vitiligo Research Foundation
Further Reading:
- We Hit Pause on Vitiligo.ai as a Self‑Help Therapist — Here’s Why
- The Chatbot Will See You Now? Why Passing the Turing Test Isn’t Enough in Medicine
- How AI Is Replacing Social Media — and What It Means for Healthcare Communications
- Do AI Models Really Understand the World of Vitiligo?
- We’re Taking Vitiligo to YouTube — Before AI Chatbots Get It Wrong, Forever
Listen to Deep Dive in Vitiligo:
- Episode 26: "The Great AI Debate: Do AI Models Truly ‘Get’ It?"
- Episode 21: “ChatGPT vs. Doctors: The Promise and Pitfalls of AI in Medical Diagnosis”
- Episode 17: "Pharma, Startups, and AI: From 2024 to 2025"

Social media is fading; AI companions are rising
Only a small fraction of activity is friend-driven now. Some studies even found people preferred chatbot answers over physicians' in forum settings. That's a bedside-manner lesson, not a reason to replace clinicians.
Here’s another shift I didn’t see coming: social media as we knew it is fading. Feeds have become passive TV. Mark Zuckerberg basically admitted it: only 17% of Facebook activity and 7% of Instagram activity involves posts from friends anymore. They’ve become passive entertainment platforms. Netflix, not neighborhood hangouts.
Into that gap steps the chatbot — conversational, available, and increasingly "empathetic."
Generative AI has evolved from novelty toy to surprisingly capable companion. And get this: it delivers high-quality medical responses 3.6 times more often than physicians (78.5% vs 22.1%). Patients rate its responses as empathetic nearly 10 times more often than doctors’ responses (45.1% vs 4.6%).
Read that again and let it sink in. We’re shifting from search engines and newsfeeds to real-time, personalized AI-conversations. Not about what’s trending — about what matters to you — and not always in your best interest.