AI Privacy Report
Isometric chat assistant window showing data retention timeline, training pipeline, and human review process
policy

The Privacy Risks of AI Chat Assistants: Retention, Review, Training

Consumer AI assistants increasingly default to using your conversations for training, human review, and multi-year retention. The privacy and legal analysis behind the 2025 policy shifts.

By AI Privacy Report Editorial · · 8 min read

The privacy posture of consumer AI assistants shifted materially in 2025, and most users never noticed. Across the major providers, the default for consumer chat moved toward using your conversations to train models, retaining them for years, and routing some of them to human reviewers. None of this is inherently unlawful — but it changes the privacy calculus of typing into a chatbot, and it raises real questions under GDPR and US privacy law that the convenience of these tools tends to obscure.

This is the analysis of what changed, why it matters legally, and where the genuine risks sit. It is descriptive of publicly reported policy positions as of late 2025; specific terms vary by provider and tier and change frequently, so treat the named examples as illustrative of a trend rather than a current spec sheet.

What Changed in 2025

Through 2025, the three most prominent assistant providers each adjusted consumer terms in the same direction — making conversations training data and reviewable by default, with opt-out rather than opt-in:

  • Anthropic moved to using consumer Claude conversations for model training by default unless users opted out, and extended data retention for non-opted-out consumer data from 30 days to as long as five years. (Enterprise and certain business tiers were treated differently.)
  • OpenAI uses Free and Plus users’ chats to train its models unless the user opts out via Data Controls, and has described monitoring systems that can escalate concerning content to human reviewers.
  • Google similarly moved toward using consumer Gemini chats and uploaded data for model improvement unless disabled, with human reviewers able to see conversations routed for review.

The common pattern: the privacy-protective choice exists, but it is a setting you must find and change, and the default favors data use.

The Three Distinct Risks

It helps to separate three things that get blurred together, because they carry different consequences.

Risk 1: Training on your conversations

When chats become training data, the content you typed can influence a model that millions of people use. The acute concern is memorization and regurgitation — the possibility, demonstrated in research, that models can reproduce fragments of training data. Anything sensitive you type (a client’s name, a medical detail, proprietary code) is, in the default configuration, potentially incorporated into a system you no longer control.

Risk 2: Human review

Conversations escalated for safety or quality review can be read by people. This is a different exposure than automated training: a specific human, under the provider’s access controls, may read what you wrote. Content flagged by automated monitoring is the typical trigger, and reviewed content can be retained longer.

Risk 3: Extended retention

Longer retention windows (years rather than days) widen the breach and legal-process surface. Data that no longer exists cannot be exfiltrated in a breach or produced under subpoena. Extending retention from 30 days to five years is a multi-year expansion of the period during which your conversations are exposed to both risks.

The shift to opt-out defaults is where the legal questions concentrate.

Under GDPR, processing personal data needs a lawful basis (Article 6). Using conversations to train a model is a distinct processing purpose from delivering the assistant’s answer, and an opt-out default sits uneasily with consent (which must be freely given, specific, and unambiguous — i.e., opt-in) — pushing providers toward legitimate interest as the basis, which then requires the balancing test against users’ reasonable expectations. A reasonable user’s expectation that a private chat would not silently become training data is exactly the kind of factor that test weighs. The EDPB’s AI-models opinion and the CNIL’s training-data guidance (covered in the companion piece on training-data rights) frame how regulators will scrutinize this for EU users.

Under US state law, the relevant hooks are different but real: where an assistant’s output drives a “significant decision,” California’s ADMT rules attach; and the conversation content itself is personal information subject to access and deletion rights under the CCPA and similar state statutes. The retention extension expands the volume of personal information a provider must be able to locate, disclose, and delete on request.

For organizational users, the sharpest issue is confidentiality: employees pasting client data, source code, or regulated information into a consumer-tier assistant may be exporting it into a training and human-review pipeline, with downstream contractual and regulatory consequences. Most providers’ enterprise tiers contractually exclude training on customer data — which is precisely why the consumer/enterprise distinction is a governance control, not a billing detail.

What Reduces the Risk

For individuals:

  • Change the default. The training opt-out and data-control settings exist; the protective posture is one settings page away. Find it and set it.
  • Treat consumer assistants as non-confidential. The Stanford HAI guidance reduces to a simple heuristic: be careful what you tell your AI chatbot. Don’t type anything into a consumer tier you’d be unwilling to see retained, reviewed, or learned from.
  • Use temporary/incognito modes where offered for sensitive sessions, understanding their limits.

For organizations:

  • Mandate enterprise tiers with contractual no-training terms for any work involving client, regulated, or proprietary data — and back it with policy.
  • Treat consumer-assistant use as a data-export event in your data-governance program, with the same scrutiny as any third-party data sharing.
  • Run a DPIA where assistant use processes personal data at scale or feeds significant decisions.

The Underlying Tension

These products improve by learning from use, and the providers’ shift to data-hungry defaults is the commercial expression of that. The privacy cost is borne by users who never see the toggle. The legal systems are catching up — GDPR’s lawful-basis discipline and the US states’ access/deletion and ADMT rules both bite on exactly these practices — but enforcement lags the product changes. Until it catches up, the protective move is individual: assume the default is data-maximizing, change the setting, and never confide in a consumer assistant anything you couldn’t tolerate being trained on, reviewed, and retained for years.

Cross-references

For the GDPR analysis of training data and erasure rights this depends on, see training-data privacy and data-subject rights. For when an assistant’s output triggers automated-decision rules, see GDPR Article 22 and LLM automated decision-making and the companion piece on CCPA ADMT obligations. For the assessment that organizational use should trigger, see the DPIA template for LLM deployment.

For ongoing coverage of provider policy changes and regulator responses, AI policy watch follows the space.

Sources

  1. Anthropic — consumer data and training (TechCrunch reporting)
  2. Stanford HAI — Be Careful What You Tell Your AI Chatbot
  3. GDPR Article 6 — Lawfulness of processing
Subscribe

AI Privacy Report — in your inbox

AI privacy regulation, compliance, and enforcement, sourced. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments