Privacy & data training — does it train on what you type?

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-25 Plain-English tagline: Every major AI tool has a different default about whether it uses your conversations to train future models. Here’s the actual answer for each one, and how to opt out.

The short answer

Tool	Free / consumer tier	Paid consumer tier	Business / Team / Enterprise	API
Anthropic Claude (claude.ai)	⚠️ Trains by default (since Sep 2025); opt-out available	⚠️ Same — opt-out available	✅ Does not train (Team/Enterprise)	✅ Does not train by default
OpenAI ChatGPT	⚠️ Trains by default; opt-out available	⚠️ Same — opt-out available	✅ Does not train (Team/Enterprise/Edu)	✅ Does not train by default (since March 2023)
Google Gemini	⚠️ Trains by default; opt-out available (turn off “Gemini Apps Activity”)	⚠️ Same — opt-out available	✅ Does not train (Workspace + Google AI Pro/Ultra “data not used”)	✅ Paid tier does not train; free AI Studio does
Microsoft Copilot	⚠️ Trains by default for Copilot consumer; opt-out via Settings	✅ Copilot Pro can opt out	✅ M365 Copilot does not train; data stays in tenant	✅ Azure OpenAI Service does not train by default
xAI Grok	⚠️ Trains on your posts and chats by default	⚠️ Same — opt-out via Settings	(no separate tier)	⚠️ API trains by default — check current policy
Meta AI	⚠️ Trains on your interactions (in regions where allowed)	(no paid tier)	(no business tier)	(Llama is open-weight; you host)
Perplexity	⚠️ Trains by default; opt-out available	✅ Pro can opt out	✅ Enterprise does not train	✅ Enterprise API does not train
GitHub Copilot	⚠️ Free trains by default	✅ Pro: can opt out of code suggestions training	✅ Business + Enterprise: never trains	(no separate API)
Cursor	⚠️ Trains by default in “Privacy Mode OFF”	✅ Pro: enable Privacy Mode to opt out	✅ Business: Privacy Mode default ON	(uses third-party model APIs)

⚠️ = does train by default · ✅ = does not train

The two-line rule of thumb:

Free consumer apps train on your inputs by default. They have an opt-out switch.
API and Business/Enterprise tiers do not train by default at any major Western provider. Their commercial reputation depends on this — they’re worth trusting on it.

What “training on your input” actually means

When you type into an AI chat, three different things happen to your text. People confuse them.

1. The model reads it to answer you (always happens)

Your prompt is sent to the model, the model produces a response, the response comes back to you. This has to happen — it’s how the AI works. There’s no way to “use AI without your prompt being processed by the AI.”

2. Your prompt and response are stored as chat history (usually happens, often controllable)

So you can come back tomorrow and see what you talked about. This is convenience, not training. The data sits on the provider’s servers, encrypted, accessible only to your account (and provider staff under controlled circumstances — see the “who at the provider can see your chats” section).

3. Your prompt and response are sampled to train the next model (the controversial one)

When the provider builds the next version of their model, they have a choice: use only public-internet data, OR use public-internet data PLUS a sample of real user conversations (after filtering for personal information, profanity, etc.). User conversations are gold for training because they reflect real-world usage; public internet data is much more “essay-shaped” and doesn’t teach the model how people actually want help.

Most consumer chat services default to “yes, sample for training” because the upside (better future models) is significant and most users don’t care. Most API and Enterprise services default to “no” because business customers care a lot.

Important: “trains on your input” does not mean your specific words appear verbatim in the next model. It means your conversations are part of the training dataset — sampled, filtered, combined with hundreds of millions of other conversations, used to fine-tune model behavior at the statistical level. Recovering your specific words from a trained model is in practice almost impossible.

Opting out — provider by provider

Anthropic Claude.ai

Recent change you need to know: in September 2025, Anthropic announced that as of late 2025 / early 2026, consumer Claude.ai conversations are used for training by default unless you opt out. (Previously, the default was “no training.”)

To opt out:

Open claude.ai.
Click your profile (bottom-left).
Settings → Privacy → Help improve Claude.
Toggle OFF.

This stops Anthropic using your future conversations for training. It does NOT delete past conversations from the training set (those that were already used).

Anthropic API: does not train on your inputs by default. Confirmed in the Commercial Terms of Service.

Claude Team / Enterprise: does not train on inputs. Data is tenant-isolated.

OpenAI ChatGPT

Free + Plus + Pro: trains by default. To opt out:

Open ChatGPT → click your profile → Settings.
Data Controls → Improve the model for everyone → toggle OFF.

This stops future conversations being used. It also turns off chat history (in earlier UI versions — the current UI lets you keep history with training off, but check the wording on your screen).

Temporary Chat mode — the speech-bubble icon at the top of a conversation marked “Temporary.” Conversations in this mode are never used for training and disappear after 30 days. Good for sensitive questions.

ChatGPT Team / Enterprise / Edu: does not train on inputs. By policy, contractually committed.

OpenAI API: does not train on inputs since March 2023. Contractually committed. Inputs stored for 30 days for abuse-monitoring, then deleted (unless you’re flagged for abuse review).

Google Gemini

Free + AI Pro + AI Ultra: Gemini Apps Activity is on by default. To opt out:

Go to myactivity.google.com/product/gemini.
Turn Gemini Apps Activity OFF.
This stops Google using your future conversations for training AND for personalising the assistant.

You can also delete past Gemini activity from this page.

Google Workspace (Business / Enterprise): Gemini does not train on your data. Workspace data is tenant-isolated.

Google AI Studio (free tier of the Gemini API): trains on your inputs by default — this is unusual. The free AI Studio is genuinely a “we’re learning what people want to do” testing surface. Don’t paste sensitive data into the free AI Studio playground.

Google AI Studio (paid tier) and Vertex AI: do not train on inputs. Tied to Google Cloud’s enterprise contracts.

Microsoft Copilot

Consumer Copilot (copilot.microsoft.com): uses your conversations to improve the service unless you opt out. Open Settings → Privacy → toggle “Model training on text” OFF.

Microsoft 365 Copilot (the version embedded in Word, Excel, Outlook, Teams for work): does not train on your tenant’s data. Microsoft contractually guarantees this in the M365 Copilot terms.

Azure OpenAI Service: does not train on your inputs. Microsoft fine-tunes on Azure customer data only with explicit per-customer consent.

xAI Grok

Default behaviour (as of 2025–26): Grok trains on your X (Twitter) posts and Grok chat conversations by default. xAI has been more permissive than other providers about training scope.

To opt out of training on X posts: X → Settings → Privacy and Safety → Data Sharing → “Allow your posts as well as your interactions, inputs and results with Grok to be used for training and fine-tuning” → toggle OFF.

Grok API: check current xAI terms; the policy has been less consistent than other providers.

Meta AI

Default behaviour: trains on your interactions with Meta AI inside WhatsApp, Instagram, Messenger, Facebook, and meta.ai. In the EU and UK, Meta’s training on user data is constrained by GDPR; in Australia, Meta’s training on Facebook/Instagram posts has been challenged by the OAIC and is partially restricted.

Hard truth: opt-out for Meta AI is limited and varies by region. The simplest privacy answer is “don’t use Meta AI for anything sensitive.”

Perplexity

Free: trains by default. Settings → Account → Data Controls → “AI Data Retention” → toggle OFF.

Pro and Enterprise: opt-out available + Enterprise is contractually opt-out by default.

GitHub Copilot

Free: trains on your code suggestions by default. Settings → Copilot → “Allow GitHub to use my code snippets from the code editor for product improvements” → toggle OFF.

Pro, Business, Enterprise: never trains on your private code. Public code in public repos is fair game for training (under the standard GitHub TOS).

Cursor

Default: “Privacy Mode” is OFF for Free, ON optional for Pro, ON default for Business.

When Privacy Mode is OFF, your code, prompts, and AI responses can be retained by Cursor and may be used to improve the product. When Privacy Mode is ON, your code is not retained beyond the time needed to answer your request.

To enable: Cursor → Settings → General → Privacy Mode → ON.

Data retention vs training — different things

These get confused.

Retention = how long the provider keeps your conversations on their servers (so you can scroll back to them).
Training = whether your conversations are sampled into the next model’s training dataset.

You can have:

Retention ON, Training OFF → conversations stored for your reference, not used for training. (Most paid consumer setups.)
Retention OFF, Training OFF → conversations not stored anywhere after the response is delivered. (ChatGPT Temporary Chat; Cursor Privacy Mode.)
Retention ON, Training ON → conversations stored AND used for training. (Free consumer default for most tools.)

For sensitive content, you want both OFF.

What about voice conversations?

When you use voice mode (ChatGPT Advanced Voice, Gemini Live, Claude voice when available), your audio is:

Sent to the provider.
Transcribed to text (server-side).
Processed by the model.
Response generated as text, then synthesized to speech.

The transcribed text is treated the same as a typed chat for training purposes — opt out applies. The audio file itself may or may not be stored separately:

OpenAI Advanced Voice: audio recordings may be retained for 30 days for abuse-monitoring; not used for training if you’ve opted out of training.
Google Gemini Live: audio activity is part of Gemini Apps Activity — turning that off covers voice too.
Anthropic Claude voice: new product; check current docs at anthropic.com/legal.

Best practice: don’t say things into voice AI you wouldn’t type.

What about image / file uploads?

When you upload an image, PDF, spreadsheet, or other file to an AI tool, the file is:

Uploaded to the provider’s storage.
Read by the model (sometimes after conversion).
Subject to the same retention + training policies as text.

Exceptions worth knowing:

Files uploaded to ChatGPT Projects / Claude Projects / Notion AI are retained as part of the project workspace until you delete them, regardless of the chat-history setting.
Files uploaded to free AI Studio (Google) are subject to AI Studio’s training policy by default — don’t upload sensitive PDFs.
Voice notes and images in WhatsApp / Messenger / Instagram + Meta AI — same caveats as Meta AI generally.

Who at the provider can see your chats?

Even with training fully off, there are circumstances under which provider employees might see your conversations:

Abuse / safety review. If your conversation is flagged by automated systems for harmful content, a human reviewer may look at it.
Legal compliance. If the provider receives a valid legal demand (Australian or US court order, search warrant), they may produce stored conversations.
Support tickets. If you open a support ticket and reference a specific chat, the support agent has access to read it.
Bug investigation. Engineering may sample conversations to debug a reported issue, anonymised where possible.

Western providers publish transparency reports (typically annually) listing how many legal demands they received and how many they complied with. Chinese providers do not publish meaningful transparency reports.

What about training on the public internet?

Separately from your private conversations, all the major frontier models are trained on enormous slices of the public internet — websites, books, code on public GitHub, articles, social media (where permitted). This is sometimes confused with “training on your input.”

If you’re a content creator and you don’t want your public website used for training, you can:

Add a robots.txt rule to block specific AI crawlers (User-agent: GPTBot, User-agent: ClaudeBot, User-agent: Google-Extended, etc. — each provider publishes its crawler name).
Use Cloudflare’s “Block AI Bots” feature (one click, blocks the major crawlers).
For images, use Glaze / Nightshade (artist tools that subtly poison training).

This is a separate concern from chat-input training and beyond the scope of this entry — there’s a future entry on it.

Australian-specific considerations

The Australian Privacy Act 1988 applies to AI providers handling Australian users’ personal information. The Office of the Australian Information Commissioner (OAIC) has issued guidance on generative AI in two parts (2024–25):

Guidance for businesses deploying AI (don’t feed customers’ personal info into AI tools without their consent and a privacy-impact assessment)
Guidance for businesses developing AI (training-data practices have privacy obligations)

For individual users, the practical implications:

Don’t paste other people’s personal information (medical records, financial details, full names + addresses) into AI tools where you haven’t been given permission. The Privacy Act treats you as the controller of that data.
For business use, prefer enterprise tiers (Claude Team/Enterprise, ChatGPT Team/Enterprise, Microsoft 365 Copilot, Google Workspace + Gemini, Azure OpenAI Service) where contractual no-training is the default.
Notifiable Data Breach scheme: if an AI tool you use has a breach affecting your customers’ data, you (the business) may have a 30-day reporting obligation, not just the AI vendor.
Cross-border data flow consent: most AI processing happens in the US; under Australian Privacy Principle 8, you should disclose this in your privacy policy if you’re collecting Australian users’ data and routing it through US AI services.

See australian-privacy-considerations.md for the deeper dive.

Practical recommendations

For personal everyday use: opt out of training in your consumer chat tool of choice (one minute, once, done). Use Temporary Chat or equivalent for sensitive one-offs.
For paid coding tools: turn Privacy Mode ON if available (Cursor, Windsurf). Use GitHub Copilot Pro+ rather than Free.
For business / regulated work: use enterprise tiers — Claude Team/Enterprise, ChatGPT Team/Enterprise, M365 Copilot, Google Workspace Gemini, AWS Bedrock, Azure OpenAI Service, Google Vertex AI. These are contractually no-training by default.
For genuinely sensitive content (legal, medical, executive strategy, personal mental health): in addition to opting out, prefer:
- Enterprise-tier tools (above)
- Tools that offer Australian data residency (AWS Bedrock Sydney, Azure OpenAI Australia East, Vertex AI australia-southeast1)
- Local AI on your own machine (Ollama, LM Studio) when the use case allows
For client work where the client is the data subject: put your AI usage policy in writing in your contract — what tools, what data, what tier. Many enterprise clients will require this.
Don’t put into AI anything you wouldn’t write in an email. “Email-grade” is a good threshold for what’s casually appropriate.

Common gotchas

Turning off training does not retroactively remove past data from training sets. Anything you typed before opting out may already be in a training run.
Opting out in the website does not always opt out the mobile app — check both UIs.
Browser extensions and third-party Claude/ChatGPT clients have their own data policies; opting out on the main site doesn’t cover them.
“Temporary Chat” still goes through the provider’s servers — it’s not local AI. It’s just unstored after the response.
Memory features (ChatGPT Memory, Claude Memory) store data separately from chat history — they have their own toggle and their own deletion process.
Custom instructions / system prompts can be retained even with chat history off — they’re treated as account-level settings, not as conversation data.
Enterprise contracts can be wrong if not negotiated — the default enterprise SKU usually is contractually no-training, but enterprise sales reps sometimes try to sell “improved-model” terms; read the data-handling appendix before signing.

Tech & AI, Explained

Explorer

privacy-and-data-training