Safeguarding mental-health conversations with chatbots: what the UK has (and what’s missing)

Shield overlay redirects a chatbot message toward help icons, illustrating a safety layer for safeguarding mental-health chats and crisis hand-offs.

Executive summary

A Lords question on 4 November sharpened focus on a real gap: the Online Safety Act (OSA) can catch some chatbot harms, but it doesn’t set clinical safeguards for crisis-adjacent conversations. Meanwhile, specialist mental-health AIs like Limbic already sit under the MHRA’s medical-device regime, whereas general chatbots do not. Recent actions by Character.AI show why defaults and design choices matter for young users. The result is a regulatory split that product teams must bridge with safety-by-design. Qognetix offers a practical, auditable way to do that—today.

1) The moment we’re in

  • Behaviour shift: People now confide in chat—even when the app makes no clinical claim. Disclosures of distress arrive in ordinary sessions, at all hours.
  • Policy attention: OSA duties (illegal harms/child safety) are live; medical-device rules govern clinical tools; parliamentary scrutiny is rising on what happens in the grey zone between.
  • Operator reality: Teams are asked to prevent harm, protect minors, and provide evidence of responsible conduct—without turning general chat into a medical device.

2) What the UK frameworks actually do

  • OSA (safety floor): Duties for risk assessment, proportionate mitigations, and stronger child protections. Useful where chat behaves like search or hosts harmful content. It does not define crisis triage or therapeutic standards.
  • MHRA/UKCA (clinical pathway): Applies when a tool claims assessment, diagnosis, treatment, or monitoring. Brings validation, quality systems, and post-market surveillance. Most general chat doesn’t sit here.
    The gap: non-clinical chat that nonetheless receives crisis-adjacent talk.

3) Where harm can creep in (even with “good intentions”)

  • Unsafe responses: Minimising language, method speculation, “therapy role-play” that oversteps.
  • Minors and boundary drift: Companion features morph into romantic or quasi-therapeutic exchanges.
  • Safety drift: Prompt/model updates quietly undo safeguards that “worked last month.”
  • Evidence vacuum: Boards and partners ask for proof you offered real help; transcripts are privacy-heavy, screenshots aren’t process.

4) What “good” looks like (non-clinical, operational baseline)

  • Humane refusal: No methods, no therapy cosplay; decline with warmth and dignity.
  • Age-aware by default: Treat unknown users as minors; block romantic/therapeutic simulation for under-18s; sensible session limits.
  • Help in one tap: Surface local lifelines at the moment of risk (e.g., 999, NHS 111, Samaritans 116 123, Shout 85258).
  • Consistency across releases: Guardrails remain in force when prompts/models change.
  • Evidence without voyeurism: Light-touch proof you offered help—no hoarding raw conversations.

5) The solutions landscape (and their trade-offs)

  • Do-it-yourself: Prompt policies, filters, in-house classifiers, help pages.
    • Pros: Immediate control. Cons: Fragile, uneven across teams, hard to audit responsibly.
  • Clinical route (when appropriate): If you truly assess/diagnose/treat.
    • Pros: Rigour. Cons: Not a fit for general chat; long evidence path.
  • Point fixes: Disclaimers, helpline links, manual reviews.
    • Pros: Quick. Cons: Help often appears too late; safety regresses after updates.

What’s missing is a way to make safe behaviour default, consistent, and auditable—without reclassifying general chat as clinical care.

6) What responsible builders can do this month

  1. Write your red lines: no methods; no therapy role-play; no romance with minors.
  2. Make help one tap away: phone, text, webchat—visible when risk appears.
  3. Say what you are (and aren’t): plain-English capability statement in-product.
  4. Test like you ship: run a “hard cases” pack before each release and after model/prompt changes.
  5. Decide what to evidence: time-to-help and help-uptake, not raw transcript storage.

7) Where the market is heading

  • From prompts to product patterns: Safety moves out of model prompts and into the product layer (interaction design, state control, age modes).
  • From links to hand-offs: Lifelines become first-class UI, not buried links.
  • From anecdotes to receipts: Providers evidence that help was offered, with minimal data retention.

8) Qognetix—our role, briefly

We don’t build chatbots. We build an Engine intended to sit beneath conversational products so they can behave responsibly in crisis-adjacent moments. Today, that means foundational primitives (policy hooks, state/control points, event logging) rather than a full safety suite. We are actively seeking design-partners to co-develop and evaluate the guardrail patterns, hand-off flows, and testing practices described in this article—strictly within a non-clinical scope. We are not a medical device and make no clinical claims.

9) Questions to guide your next review

  • How do you prove you did the right thing without hoarding transcripts?
  • When a user discloses suicidal thoughts, what appears on screen—and how fast?
  • How are minors protected by default, including unknown-age users?
  • Are help routes useable in one tap (call, text, webchat), on mobile and with assistive tech?
  • What’s your post-update safety check so behaviour doesn’t regress?

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top
0
Would love your thoughts, please comment.x
()
x