AI Tool Privacy Tracker

What every major AI tool actually does with your data. Training behavior, retention, and on-device support — verified against primary sources, with a separate row for consumer and business tiers because the answer is different.

Last updated April 27, 202615 tools trackedNext review May 27, 2026

Recent Changes Scoreboard Use case fit AI Assistants Coding Tools Voice & Dictation Per-Tool Q&A FAQ

Color-coded chips encode the four possible answers in the matrix below.

Recent Changes

Dated policy shifts that changed what a tool does with your data. Each entry is linked to a primary source.

April 24, 2026
GitHub Copilot
GitHub began using Free / Pro / Pro+ user interaction data, including code snippets, to train AI models by default. Existing opt-outs are honored. Business and Enterprise are unaffected.
Source: github.blog/news-insights
September 26, 2025
OpenAI / ChatGPT
OpenAI's obligation to indefinitely retain consumer ChatGPT and API content, imposed under the NYT litigation order, ended. Standard 30-day retention practices resumed. Limited April–September 2025 data is still preserved under the order.
Source: openai.com
August 28, 2025
Anthropic / Claude
Anthropic shifted consumer Claude (Free, Pro, Max) from "not used for training" to a user-choice model. Users who opt in have data retained up to 5 years (vs. previous 30 days). Existing-user choice deadline: October 8, 2025.
Source: anthropic.com/news

How this tracker is maintained

Each cell is verified against the vendor's own privacy policy, terms of service, or technical documentation. We don't paraphrase what the policy "probably" says — only what it actually states.
We separate consumer and business / API tiers because they operate under different contracts. The tier filter on each table reflects this; conflating them is the most common error in third-party comparisons.
"Trains on your data?" answers what happens by default at sign-up. The note under each chip describes the toggle, if one exists.
Each tier carries four independent scores on a 0–25 scale: Training, Retention, On-device, and Track record. The four axes sum to a 0–100 composite. We publish per-axis scores because the dimensions trade off differently for different use cases, and the composite for readers who need a single answer.
Track record reflects documented incidents, breaches, and unfavorable policy changes — sourced from the vendor's own announcements, court filings, and reporting that we link from this page or from related Voibe Resources articles. We don't penalize a vendor for a past mistake they have visibly fixed; we do penalize opaque reverses and silent expansions of data collection.
The matrix is reviewed on a roughly monthly cadence and updated immediately whenever a vendor announces a policy change. Each row carries its own Last verified date.
Errors get fixed fast: email hi@getvoibe.com with a primary-source link and we'll update the row.

Spotted an error? hi@getvoibe.com. Include the cell and a primary-source link and we'll update on the next pass.

On the cloud side, the “Trains?” chip color tells you which downstream outcomes apply. On-device removes them entirely.

Conflating the two tiers is the most common error in third-party AI privacy comparisons. Each table on this page filters between them.

Legend:Yes (default)— Trains by default with no opt-out pathYes (opt-out)— Trains by default; user can disable in settingsUser choice— User must actively choose during signup or in settingsNo— Does not train on user data, period

Score legend — per axis (0–25)

22–25Strong — Architectural or contractual guarantee

17–21Solid — Default-off training, short retention, or partial on-device

11–16Mixed — User must take action, or policy has caveats

6–10Weak — Unfavorable default, opt-out only

1–5Poor — No opt-out path, or indefinite retention

0Unclear — Policy does not address this dimension

Total score (0–100, sum of four axes)

85–100Excellent

70–84Strong

55–69Adequate

40–54Weak

0–39Poor

Privacy scoreboard

One row per tool, showing its best-scoring tier in the selected tier group. Each axis is scored 0–25; the total adds the four axes for a 0–100 composite. Sorted by total by default. Tools with multiple sub-tiers in the same group (e.g., Superwhisper’s on-device vs cloud modes) are detailed in the matrix tables below.

Sort

#	Tool	Category	Plan tier	Training	Retention	On-device	Track record	Total
1	Voibe	Voice & Dictation	All plans	25	25	25	23	98/100Excellent
2	Superwhisper	Voice & Dictation	On-device modes (Fast / Nano / Standard / Parakeet — Free + Pro)	25	23	25	18	91/100Excellent
3	MacWhisper	Voice & Dictation	Pro (Gumroad) / Whisper Transcription (App Store)	23	20	23	22	88/100Excellent
4	Cline (open-source agent)	Coding Tools	Open-source extension (BYOK)	23	25	13	23	84/100Strong
5	Apple Dictation	Voice & Dictation	macOS / iOS (Apple Silicon, supported languages)	20	13	18	18	69/100Adequate
6	Claude (Anthropic)	AI Assistants	Free / Pro / Max	15	18	3	15	51/100Weak
7	ChatGPT (OpenAI)	AI Assistants	Free / Plus / Pro	10	18	3	15	46/100Weak
8	Typeless	Voice & Dictation	Free / Pro	15	23	3	5	46/100Weak
9	Cursor	Coding Tools	Individual default (Privacy Mode OFF)	8	3	13	13	37/100Poor
10	Wispr Flow	Voice & Dictation	Free (Privacy Mode OFF, default)	20	3	3	10	36/100Poor
11	Gemini (Google)	AI Assistants	Free / Gemini Advanced	10	10	3	10	33/100Poor
12	Windsurf (formerly Codeium)	Coding Tools	Individual (no ZDR, default)	10	5	3	13	31/100Poor
13	GitHub Copilot	Coding Tools	Free / Pro / Pro+	10	5	3	8	26/100Poor
14	Aqua Voice	Voice & Dictation	Free / Pro / iOS Pro	8	0	3	15	26/100Poor
15	Perplexity	AI Assistants	Free / Pro / Max	8	3	3	8	22/100Poor

Use case fit: which score is right for what?

A composite score is only useful if you know what threshold to look for. The table below maps four common buyer scenarios to the minimum score we recommend, the reasoning, and the tools that currently meet that bar.

Healthcare, legal, financial — regulated work

85+

Patient records, attorney-client privileged content, financial data — anywhere a leak triggers regulator notification or contract liability.

Why this floor: Below 85, the vendor either retains data longer than 30 days, lacks a contractual training exclusion, or has a track-record incident in the past 24 months. Regulated work has no margin for any of those.

Watch out for: BAA / DPA availability — score doesn't reflect HIPAA contracts. A tool can score 85+ and still be unusable for PHI without a signed BAA. Always verify the contract before using a tool for regulated content.

Tools that meet the bar (business tier)

VoibeVoice & Dictation · All plans98/100Excellent
MacWhisperVoice & Dictation · Same posture (no separate enterprise tier)88/100Excellent

Proprietary code, internal docs, M&A drafts

70+

Sensitive but not strictly regulated. Strategy docs, source code, contract drafts, customer data without compliance overlay.

Why this floor: 70+ means contractual ZDR or short retention with a vendor that has a clean recent track record — you trust them not to retain or train, even if you'd still avoid pasting raw secrets.

Watch out for: Consumer-tier mistakes. Most consumer tiers score below 50 — make sure your team is on the business plan, not pasting internal docs into ChatGPT Free. The tier filter on each table makes this visible.

Tools that meet the bar (business tier)

VoibeVoice & Dictation · All plans98/100Excellent
MacWhisperVoice & Dictation · Same posture (no separate enterprise tier)88/100Excellent
Cline (open-source agent)Coding Tools · Open-source extension (BYOK)84/100Strong
Windsurf (formerly Codeium)Coding Tools · Teams / Enterprise (ZDR default ON)79/100Strong

Day-to-day drafting, research, light coding

55+

Public-facing content, general knowledge work, code that isn't a trade secret, prompts you wouldn't mind appearing in a leak.

Why this floor: 55+ means the tool is reasonably well-behaved by default or has a clear, working opt-out path that most users will actually flip.

Watch out for: Default settings. Many tools score 55+ only after the privacy toggle is on. Verify each user has actually flipped it — onboarding teams to a private-by-default workflow is more reliable than chasing settings.

Tools that meet the bar

VoibeVoice & Dictation · All plans98/100Excellent
SuperwhisperVoice & Dictation · On-device modes (Fast / Nano / Standard / Parakeet — Free + Pro)91/100Excellent
MacWhisperVoice & Dictation · Pro (Gumroad) / Whisper Transcription (App Store)88/100Excellent
Cline (open-source agent)Coding Tools · Open-source extension (BYOK)84/100Strong
Windsurf (formerly Codeium)Coding Tools · Teams / Enterprise (ZDR default ON)79/100Strong
Apple DictationVoice & Dictation · macOS / iOS (Apple Silicon, supported languages)69/100Adequate
CursorCoding Tools · Team / Enterprise (Privacy Mode ON, default)64/100Adequate
ChatGPT (OpenAI)AI Assistants · Team / Enterprise / API61/100Adequate

Personal, low-stakes use

Any

Notes to self, brainstorming, creative writing — content you wouldn't mind seeing in a leak.

Why this floor: Any tool works if you understand the tradeoff. Default consumer tiers of major assistants land in the 30–50 range; that's fine for non-sensitive prompts.

Watch out for: Voice input. Audio is uniquely sensitive — even casual dictation may capture identity-revealing details, ambient conversations, or addresses. The On-device score matters more for voice than for text.

Tools that meet the bar

VoibeVoice & Dictation · All plans98/100Excellent
SuperwhisperVoice & Dictation · On-device modes (Fast / Nano / Standard / Parakeet — Free + Pro)91/100Excellent
MacWhisperVoice & Dictation · Pro (Gumroad) / Whisper Transcription (App Store)88/100Excellent
Cline (open-source agent)Coding Tools · Open-source extension (BYOK)84/100Strong
Windsurf (formerly Codeium)Coding Tools · Teams / Enterprise (ZDR default ON)79/100Strong
Apple DictationVoice & Dictation · macOS / iOS (Apple Silicon, supported languages)69/100Adequate
CursorCoding Tools · Team / Enterprise (Privacy Mode ON, default)64/100Adequate
ChatGPT (OpenAI)AI Assistants · Team / Enterprise / API61/100Adequate

AI Assistants

Chatbots and search assistants. Consumer tiers vary the most — check whether your account is logged-in vs. logged-out, and whether you've reviewed your data settings since the last policy change.

Tool	Plan tier	Data collected	Trains on your data?	Retention	On-device	Track record	Total	Last verified	Source
ChatGPT (OpenAI)	Free / Plus / Pro	Prompts, outputs, uploaded files, usage, IP, device info, account info	Yes (opt-out) Off via Settings → Data Controls → "Improve the model for everyone." Temporary Chat is never used for training. Training10/25	30 days after deletion. April–September 2025 data preserved due to NYT order; standard practice resumed Sept 26, 2025. Retention18/25	No On-device3/25	March 2023 chat-history bug; April–September 2025 NYT-mandated indefinite retention. Track record15/25	46/100Weak	Apr 27, 2026	Privacy policy Model improvement use
Claude (Anthropic)	Free / Pro / Max	Chats, coding sessions (when using Claude Code with consumer accounts), feedback (thumbs)	User choice Active choice required during signup or in Privacy Settings ("You can help improve Claude"). Off by default for users who decline. Policy changed August 2025. Training15/25	30 days if declined. 5 years if enabled. Flagged conversations: 2–7 years for trust & safety. Retention18/25	No On-device3/25	August 2025 reversal: consumer Claude moved from 'never used for training' to user choice. Track record15/25	51/100Weak	Apr 27, 2026	Aug 2025 policy update Data storage policy
Gemini (Google)	Free / Gemini Advanced	Chats, files, photos, videos, screen content, account info, IP, device info	Yes (opt-out) Off via "Gemini Apps Activity" → Off. Even when off, future chats are kept for 72 hours so Gemini can respond and process feedback. Training10/25	18 months default (adjustable to 3 months / 36 months / never). Human-reviewed chats retained up to 3 years (disconnected from account). Retention10/25	No On-device3/25	Apps Activity 'off' still keeps chats 72h; human-reviewed conversations retained up to 3 years. Track record10/25	33/100Poor	Apr 27, 2026	Apps Activity controls
Perplexity	Free / Pro / Max	Queries, prompts, AI responses, usage, device info	Yes (opt-out) Off via Account Settings → Preferences → "AI Data Retention." Logged-out users are trained on by default with no opt-out path. Training8/25	Threads kept until manually deleted. Account deletion processed within 30 days. Retention3/25	No (Comet browser stores some data locally — separate policy) On-device3/25	2024 reporting documented robots.txt evasion via undisclosed user-agent; logged-out users still trained on. Track record8/25	22/100Poor	Apr 27, 2026	Data collection Privacy policy

AI Coding Tools

IDE assistants and agents. Consumer defaults shifted in April 2026 (GitHub Copilot now trains on consumer interaction data by default). Most tools offer a Privacy / Zero-Data-Retention mode that flips the answer; check whether yours is on.

Tool	Plan tier	Data collected	Trains on your data?	Retention	On-device	Track record	Total	Last verified	Source
Cursor	Individual default (Privacy Mode OFF)	Code, prompts, editor actions, code snippets	Yes (default) Default for individual accounts ("Share Data" on). Used to improve Cursor's models. Toggle Privacy Mode ON to opt out — code never trained on, plaintext discarded after request. Training8/25	Stored indefinitely (Share Data ON). Privacy Mode ON: plaintext discarded after request; cached files encrypted with client-generated keys. Retention3/25	No (configurable to use local Ollama / LM Studio models, which bypass Privacy Mode entirely) On-device13/25	Default 'Share Data' on for individuals; no documented incidents; transparent data-use page. Track record13/25	37/100Poor	Apr 27, 2026	Cursor data use Cursor privacy
GitHub Copilot	Free / Pro / Pro+	Inputs, outputs, code snippets, associated context	Yes (opt-out) Policy changed April 24, 2026: GitHub now trains on consumer interaction data by default. Existing opt-outs honored. Toggle in Settings → Privacy. Training10/25	User Engagement Data: 2 years. Coding Agent session logs: lifetime of account. Private repo code at rest is NOT used for training; in-flight interaction data IS. Retention5/25	No On-device3/25	April 2026 reversal trains consumer interaction data by default; pending class action over training data. Track record8/25	26/100Poor	Apr 27, 2026	April 2026 policy change Privacy statement update
Windsurf (formerly Codeium)	Individual (no ZDR, default)	Logs may contain code snippets and user trajectories	Yes (opt-out) ZDR is opt-in for individuals — toggle in profile to enable. With ZDR ON, code submitted is never trained on. Training10/25	With ZDR ON: in-memory for request lifetime, plus minutes-to-hours for prompt caching. Without ZDR: logs may persist. Retention5/25	No On-device3/25	2024 Codeium → Windsurf rebrand; default-off ZDR for individuals; no documented incidents. Track record13/25	31/100Poor	Apr 27, 2026	Windsurf security Privacy policy
Cline (open-source agent)	Open-source extension (BYOK)	Cline operates no model server. Code goes only to your configured API provider (Anthropic, OpenAI, Bedrock, Gemini, etc.) and is governed by that provider's terms.	No (by Cline) Cline's stated principle: "Code never leaves your machine" toward Cline servers. Anonymous telemetry (features used, task completion) is opt-out via the "Cline Telemetry" setting. Code, file contents, command arguments, and conversation content are not collected by telemetry. Training23/25	Cline retains nothing about your code. Provider retention applies (e.g., Anthropic API ZDR, OpenAI API 30 days). Retention25/25	Partial — extension runs locally; inference happens at your chosen provider, or fully on-device if you configure Ollama / LM Studio. On-device13/25	Open-source; no model server; no documented incidents. Track record23/25	84/100Strong	Apr 27, 2026	Cline telemetry docs

Voice & Dictation

Speech-to-text and dictation tools. Voice input is uniquely sensitive — audio carries identity, biometric data, and ambient context — so the on-device column matters more than for text-only tools.

Tool	Plan tier	Data collected	Trains on your data?	Retention	On-device	Track record	Total	Last verified	Source
Voibe	All plans	No audio or transcription leaves the device. Account holders: email (account auth) plus non-identifying usage analytics; crash reports exclude dictated content.	No "The Voibe application processes your voice entirely on your device. No audio is transmitted to our servers at any point." Training25/25	Audio: not transmitted, not retained. Account email: kept while account is active. Retention25/25	Yes (only mode) — Whisper models running on Apple Silicon Neural Engine On-device25/25	New entrant; on-device-only architecture removes the surface for retention or training incidents. Track record23/25	98/100Excellent	Apr 27, 2026	Voibe privacy policy
Wispr Flow	Free (Privacy Mode OFF, default)	Audio, transcripts, edits, optional Context Awareness (screen content from active app)	Yes (opt-in) After 2024 community backlash, training is now off by default and requires opt-in. Audio retained indefinitely; 30 days for data passed to third-party LLMs (OpenAI, Meta). Training20/25	Indefinite for retained dictation data; 30 days for third-party LLM passthrough. Retention3/25	No — transcription always happens in the cloud, even in Privacy Mode (zero-retention cloud, not local). On-device3/25	2024 community backlash forced opt-in training and Privacy Mode; Free tier still indefinite by default. Track record10/25	36/100Poor	Apr 27, 2026	Wispr Flow privacy policy Privacy Mode docs
Superwhisper	On-device modes (Fast / Nano / Standard / Parakeet — Free + Pro)	None — audio processed locally and never transmitted	No "Your data is not retained on Superwhisper servers" and "not used for training AI models or any other machine learning purposes." Audio recordings are saved to local disk by default — opt out in settings. Training25/25	N/A on servers. Local recordings persist until the user deletes them. Retention23/25	Yes On-device25/25	Stable privacy-first stance; cloud modes added without separate disclosure in the public privacy policy. Track record18/25	91/100Excellent	Apr 27, 2026	Superwhisper privacy Models page
Superwhisper	Cloud modes (Ultra transcription / Super Mode LLMs — Pro)	Audio sent to Superwhisper's proxy infrastructure	No (per vendor) Superwhisper says cloud audio is proxied through their infrastructure, third-party providers don't see user account or content, and there is no training or retention. Cloud-mode handling is not currently distinguished in the public privacy policy from on-device modes — verify the latest with the vendor before sensitive use. Training18/25	Stated as not retained on servers; not separately documented for cloud modes. Retention13/25	No On-device3/25	Stable privacy-first stance; cloud modes added without separate disclosure in the public privacy policy. Track record18/25	52/100Weak	Apr 27, 2026	Superwhisper privacy Superwhisper terms
MacWhisper	Pro (Gumroad) / Whisper Transcription (App Store)	On-device modes: none transmitted. App Store version discloses "Usage Data" and "Product Interaction" as Data Not Linked to You. Cloud Assistant or BYOK (OpenAI / ElevenLabs) features send audio to those providers under their terms.	No (by MacWhisper) MacWhisper does not train its own models on user audio. Cloud Assistant and BYOK integrations inherit the chosen provider's terms (e.g., OpenAI Whisper API, Anthropic / ElevenLabs). Training23/25	On-device transcription: not retained. Cloud Assistant / BYOK: per third-party provider's terms. Retention20/25	Yes (primary mode) — local Whisper models plus Apple Foundation Models for AI features. Cloud Assistant is opt-in for higher-quality transcription. On-device23/25	Long-running indie tool; on-device by default; no documented incidents. Track record22/25	88/100Excellent	Apr 27, 2026	App Store listing Gumroad product page
Aqua Voice	Free / Pro / iOS Pro	Audio inputs, technical data (IP, browser, OS, performance metrics), session metadata. With Privacy Mode disabled, "we may securely store transcript data on our servers."	Yes (opt-out) Privacy Mode toggle stops transcript storage on Aqua Voice servers; with it enabled, "transcript data is not collected" though session metadata may still be. The privacy policy does not explicitly state whether stored transcript data is used for AI training. SOC 2 Type II certified by Advantage Partners. No HIPAA BAA publicly advertised. Training8/25	With Privacy Mode disabled: not specified in policy. With Privacy Mode enabled: transcripts not stored; session metadata (timestamps, device type, performance metrics) may be retained. Retention0/25	No — cloud transcription On-device3/25	SOC 2 Type II via Advantage Partners; privacy policy ambiguous on whether stored transcripts are used for AI training. Track record15/25	26/100Poor	Apr 27, 2026	Aqua Voice privacy policy Aqua Voice FAQ
Typeless	Free / Pro	Audio plus limited contextual information, processed on Typeless's cloud servers. Subprocessors include third-party LLM providers, analytics, and cloud infrastructure.	No (per vendor) Privacy policy: "Your data is never used to train these services and is configured for zero retention by the providers." Note: the November 2025 reverse-engineering analysis documented in our Typeless privacy issues investigation reported collection beyond what the public policy describes — verify against the current policy and subprocessor list before sensitive use. Training15/25	Per privacy policy, audio + contextual information are "processed in real time on our cloud servers and immediately discarded once the result is returned to your device." Retention23/25	No — cloud-processed in real time On-device3/25	Nov 2025 reverse-engineering analysis documented collection (URLs, window-title metadata, broad permissions) beyond what the public policy describes. Track record5/25	46/100Weak	Apr 27, 2026	Typeless privacy policy Subprocessors list
Apple Dictation	macOS / iOS (Apple Silicon, supported languages)	Audio inputs, plus contextual data (contacts, app names, etc.) when sent to servers	Opt-in only "Improve Siri & Dictation" must be enabled. Default at setup is to be asked. Training20/25	If opted in: audio + transcripts kept under a rotating random ID for up to 6 months, dissociated and kept up to 2 years for improvement; reviewed subset retained beyond 2 years. If opted out: not retained for improvement. Retention13/25	Yes (partially) — most languages on Apple Silicon process locally for general text fields (Notes, Mail, Messages). Server fallback applies to unsupported languages, search-box dictation, and some third-party Speech Recognition API uses. On-device18/25	2019 Siri grading scandal led to opt-in for human review; otherwise privacy-forward. Track record18/25	69/100Adequate	Apr 27, 2026	Ask Siri & Dictation policy Improve Siri & Dictation policy

Privacy Policy Quick Read: Does Each AI Tool Train on Your Data?

For each of the 15 tools in the matrix above, here is what the vendor's own privacy policy says about training, retention, and on-device support — quoted verbatim where the policy text supports a clean citation. Each entry links to the primary source we verified against on April 27, 2026.

AI Assistants

Does ChatGPT train on my data?

Yes, by default — opt-out available.

ChatGPT's consumer plans (Free, Plus, Pro) train on user prompts, outputs, and uploaded files by default. To opt out, navigate to Settings → Data Controls and disable "Improve the model for everyone." Conversations are retained for 30 days after deletion. Temporary Chat is never used for training. ChatGPT Team, Enterprise, and API plans are explicitly excluded from training under OpenAI's enterprise terms — API users can optionally opt in via Playground feedback. Limited April–September 2025 data is preserved due to the NYT litigation order; OpenAI's standard 30-day retention practices resumed September 26, 2025.

Primary source: OpenAI privacy policy

Does Claude train on my data?

User choice required (since Aug 2025).

As of August 28, 2025, Anthropic shifted Claude's consumer plans (Free, Pro, Max) from "not used for training" to a user-choice model. New users must actively choose during signup whether to share data for training; existing users had until October 8, 2025. Users who opt in have their data retained for up to 5 years; users who decline keep the previous 30-day retention window. Flagged conversations are retained 2–7 years for trust & safety review. Claude for Work, the Claude API, Amazon Bedrock, and Google Vertex AI are all contractually excluded from training under Anthropic's Commercial Terms.

Primary source: Anthropic Aug 2025 update

Does Gemini train on my data?

Yes, by default — opt-out via "Gemini Apps Activity."

Free Gemini and Gemini Advanced (consumer) train on user conversations by default. Per Google's documentation, when Gemini Apps Activity is on, "Google uses your activity to provide, develop, and improve its services (including training generative AI models)." To opt out, set Apps Activity to OFF — but even when off, future chats are saved for 72 hours so Gemini can respond and process feedback. Default retention is 18 months, adjustable to 3 months, 36 months, or never. Human-reviewed conversations are kept up to 3 years (disconnected from your Google Account). Vertex AI customer data is contractually excluded from training: "Google won't use your data to train or fine-tune any AI/ML models without your prior permission or instruction."

Primary source: Gemini Apps Activity controls

Does Perplexity train on my data?

Yes, by default — opt-out for logged-in users only.

Perplexity trains on user queries, prompts, and AI responses by default for Free, Pro, and Max plans. The "AI Data Retention" toggle in Account Settings → Preferences disables this. Logged-out users are trained on by default with no opt-out path — sign in to gain control. Threads are retained until manually deleted; account deletion is processed within 30 days. The Sonar API offers Zero Data Retention with prompts and responses never stored. Third-party providers (OpenAI, Anthropic) are contractually prohibited from training on Perplexity's API data. Enterprise file uploads are deleted after 7 days.

Primary source: Perplexity data collection policy

AI Coding Tools

Does Cursor train on my code?

Yes, by default for individuals — Privacy Mode opt-out.

For individual accounts, Cursor's "Share Data" mode is enabled by default, sending code, prompts, editor actions, and code snippets to Cursor for model improvement. Toggling Privacy Mode ON prevents training and discards plaintext after each request — cached files are encrypted with client-generated keys, with the encryption keys existing on Cursor's servers only for the duration of each request. Team and Enterprise accounts default to Privacy Mode ON, with zero-data-retention agreements with OpenAI, Anthropic, Google, xAI, Fireworks, Baseten, and Together. The strictest tier, Privacy Mode (Legacy), guarantees no code is stored at all, by Cursor or any third party. Cursor can also be configured to use local Ollama or LM Studio models, which bypass Privacy Mode entirely.

Primary source: Cursor data use

Does GitHub Copilot train on my code?

Yes, by default for consumer plans — opt-out as of April 24, 2026.

On April 24, 2026, GitHub began using Free, Pro, and Pro+ user interaction data — including code snippets — to train AI models by default. Existing opt-outs are honored. To disable training going forward, go to Settings → Privacy. User Engagement Data is retained for 2 years; Coding Agent session logs persist for the lifetime of the account. Private repository code at rest is NOT used for training, but in-flight interaction data IS. Business and Enterprise plans are explicitly prohibited from being used for training under GitHub's agreements: subscription Prompts and Suggestions are retained 28 days, and User Engagement Data 2 years.

Primary source: April 2026 policy change

Does Windsurf (Codeium) train on my code?

Yes for individuals by default — Zero Data Retention opt-in available.

Windsurf (formerly Codeium) trains on individual user code by default — without zero-data-retention enabled, logs may contain code snippets and user trajectories. Individuals can toggle ZDR on in their profile to prevent training; with ZDR on, "the code data submitted by zero-data retention mode users will never be trained on," code is never serialized in plaintext on Windsurf's servers, and is held only in-memory for the request lifetime (plus minutes-to-hours for prompt caching). Teams and Enterprise plans default to ZDR ON. The Enterprise Self-hosted tier deploys via Docker Compose or Helm Charts inside the customer's firewall — no traffic leaves customer infrastructure.

Primary source: Windsurf security

Does Cline train on my code?

No — Cline operates no model server. Privacy depends on your chosen API provider.

Cline is an open-source VS Code extension that operates no model server of its own. User code is sent only to whichever API provider you configure (Anthropic, OpenAI, AWS Bedrock, Google Gemini, Cerebras, Groq, etc.) and is governed by that provider's terms. Cline's stated principle: "Code never leaves your machine" toward Cline servers. Anonymous telemetry (features used, task completion rates) is collected but can be disabled via the Cline Telemetry setting. Code, file contents, command arguments, and conversation content are explicitly NOT collected by telemetry. For fully on-device use, configure Cline with a local Ollama or LM Studio model.

Primary source: Cline telemetry docs

Voice & Dictation

Does Voibe train on my voice data?

No — audio never leaves the device.

Voibe processes audio entirely on your Mac using OpenAI Whisper models running on Apple Silicon's Neural Engine. Per Voibe's privacy policy: "The Voibe application processes your voice entirely on your device. No audio is transmitted to our servers at any point" and "Your dictated content never leaves your Mac and we have no access to it." Because audio never crosses the network, there is no training to opt out of. Account holders provide an email (for authentication) and non-identifying usage analytics; crash reports exclude dictated content. The Free plan does not require an account at all.

Primary source: Voibe privacy policy

Does Wispr Flow train on my voice data?

Off by default since 2024 backlash — opt-in for training.

After 2024 community backlash, Wispr Flow shifted training to opt-in. Privacy Mode is OFF by default for Free users, meaning audio, transcripts, edits, and optional Context Awareness (screenshots of the active app's screen) are retained indefinitely. Data passed to third-party LLM providers (OpenAI, Meta) is retained for 30 days. Enterprise plans default to Privacy Mode ON with zero data retention by Wispr or any third party — audio is processed and immediately discarded after transcription. A Business Associate Agreement is available for Enterprise; once signed, Privacy Mode locks irreversibly. Transcription always happens in the cloud; even Privacy Mode is "zero-retention cloud," not local processing.

Primary source: Wispr Flow privacy policy

Does Superwhisper train on my voice data?

No — verbatim from policy.

Superwhisper's privacy policy states explicitly: "Your data is not retained on Superwhisper servers" and "not used for training AI models or any other machine learning purposes." On-device modes (Fast, Nano, Standard Whisper, Parakeet — available on the Free plan and within Pro) process audio entirely locally; nothing is transmitted. Cloud modes (Ultra transcription, Super Mode LLMs — Pro tier) proxy audio through Superwhisper's infrastructure with no retention. One caveat: audio recordings are saved to local disk by default. Opt out in settings if local audio retention is a concern. Note: the privacy policy does not currently distinguish between on-device and cloud modes — verify cloud-mode specifics with the vendor before sensitive use.

Primary source: Superwhisper privacy

Does MacWhisper train on my voice data?

No — primarily on-device, with optional cloud + BYOK paths.

MacWhisper does not train its own models on user audio. The on-device transcription path uses local Whisper models that you can download for offline use; Apple Foundation Models also run on-device for AI features. MacWhisper's optional "Assistant" cloud transcription service and BYOK integrations (OpenAI Whisper API, ElevenLabs) inherit those providers' terms when used. The App Store version's privacy disclosure shows only "Usage Data" and "Product Interaction" as Data Not Linked to You. There is no separate enterprise tier; the data-handling architecture is identical for individuals and bulk-licensing customers.

Primary source: App Store listing

Does Aqua Voice train on my voice data?

Aqua Voice's privacy policy does not explicitly state whether stored data is used for AI training. With Privacy Mode disabled, "we may securely store transcript data on our servers"; with Privacy Mode enabled, "transcript data is not collected" though session metadata (timestamps, device type, performance metrics) may still be. Aqua Voice is SOC 2 Type II certified by Advantage Partners. Teams and Enterprise plans support an org-wide Privacy Mode that applies the same protections across an entire organization. No HIPAA Business Associate Agreement is publicly advertised. Audio is cloud-processed; there is no on-device option.

Primary source: Aqua Voice privacy policy

Does Typeless train on my voice data?

No, per the published privacy policy — but verify the architecture.

Typeless's privacy policy states: "Your data is never used to train these services and is configured for zero retention by the providers." Audio plus contextual information is "processed in real time on our cloud servers and immediately discarded once the result is returned to your device." Free and Pro tiers receive the same data-handling treatment. However, a November 2025 reverse-engineering analysis (covered in our Typeless privacy issues investigation) reported collection beyond what the published policy describes — including URL capture, window-title metadata via the macOS accessibility API, and broad permission requests. Verify the current subprocessor list at trust.typeless.com/subprocessors before relying on Typeless for sensitive content.

Primary source: Typeless privacy policy

Does Apple Dictation train on my voice data?

Only if you opt in via "Improve Siri & Dictation."

Apple Dictation only uses your audio to improve its models if you have explicitly enabled "Improve Siri & Dictation" — the default at setup is to be asked. If opted in, audio and transcripts are retained under a rotating random ID for up to 6 months, then dissociated and kept for up to 2 years for improvement; a reviewed subset is retained beyond 2 years. If opted out, recordings are not retained for improvement. On Apple Silicon Macs running modern macOS or iOS, most languages process locally for general text fields (Notes, Mail, Messages). Server-side fallback applies to unsupported languages, search-box dictation, and some third-party Speech Recognition API uses. Apple does not sign a Business Associate Agreement for consumer Dictation, so it is not HIPAA-compliant.

Primary source: Ask Siri & Dictation policy

Frequently Asked Questions

What does "on-device" actually mean?

On-device means a tool can complete its core workflow without sending your input to a vendor's servers. For dictation, that means audio is captured, transcribed, and discarded entirely on your computer — nothing leaves the machine. Most AI assistants and coding tools are not on-device by default: they transmit your prompts and code to a cloud model, even if the vendor doesn't retain or train on it. "Partially on-device" means parts of the workflow are local but specific cases (unsupported languages, agentic operations, large models) fall back to the cloud. Apple Dictation and Cline (when paired with a local Ollama or LM Studio model) are examples of partial on-device.

Why does the same tool show different answers for Free vs Business?

Consumer and business tiers operate under separate contracts. Most major AI vendors train on consumer data (or did until very recently) and explicitly exclude business / API / enterprise data from training under their commercial agreements. The two tiers can use the same underlying model but with different data-handling guarantees. Conflating the two is the most common error in third-party comparison articles. The tier filter at the top of each table separates them so you can answer either question independently.

Can a tool "unlearn" my data after training?

Practically, no. Once a model has been trained on a piece of data, the parameters reflect that training and cannot be cleanly reverted on a per-record basis. Vendors offering deletion typically delete the conversation record but cannot remove its influence on the model that has already absorbed it. This is why the relevant question is "will it be used for training in the first place," not "can I delete it later." Pages like this one focus on the training question because the deletion question rarely changes the outcome for already-trained models.

How often is this updated?

Each row carries a Last verified date. We re-check every cell against its primary source on a roughly monthly cadence, plus immediately whenever a vendor announces a policy change. The Recent Changes timeline at the top of the page lists every dated change we have logged. If you find an outdated cell or a missing change, email hi@getvoibe.com — we'll update and credit the report.

How is each tool scored?

Every tier carries four independent scores on a 0–25 scale that sum to a 0–100 composite: Training (does the vendor train on your data by default), Retention (how briefly is data kept), On-device (can the workflow run without sending data to the vendor), and Track record (the vendor's documented incidents, breaches, and unfavorable policy changes). We separate the axes because a tool with a strong contract but a weak track record isn't strictly better or worse than the inverse — but we also publish the composite total because most readers ultimately want a single answer. Per-axis buckets: 22–25 architectural or contractual guarantee, 17–21 default-off or short retention, 11–16 user must take action or policy has caveats, 6–10 unfavorable default with opt-out, 1–5 no opt-out or indefinite retention, 0 the policy does not address that dimension. Composite buckets: 85+ Excellent, 70–84 Strong, 55–69 Adequate, 40–54 Weak, below 40 Poor.

What score should I look for?

The right floor depends on the work. For regulated industries (healthcare, legal, financial) or anything that triggers regulator notification on leak, look for 85+ in the business tier and verify the vendor signs a BAA or DPA. For sensitive but unregulated business work — proprietary code, internal docs, M&A drafts — 70+ is the floor; below that, you are relying on opt-out toggles that team members may not have flipped. For day-to-day drafting, research, and light coding on non-secret content, 55+ is fine. For personal low-stakes use (notes to self, brainstorming) any score works as long as you understand what the tool retains. The Use case fit section on this page lists the tools that meet each threshold.

How do I report an error or a missing tool?

Email hi@getvoibe.com with the tool name, the cell you think is wrong, and a primary-source link. We'll verify and update on the next pass. Tool requests are welcome, but to be added we need a vendor-published privacy or data-handling page that we can cite — marketing claims aren't enough.

This tracker is maintained by the team at Voibe. We built it because privacy is the central design constraint of our product, and we kept being asked these questions. Voibe is one of the tools listed — the methodology is the same for every row.

AI Tool Privacy Tracker

Recent Changes

GitHub Copilot

OpenAI / ChatGPT

Anthropic / Claude

How this tracker is maintained

Privacy scoreboard

Use case fit: which score is right for what?

Healthcare, legal, financial — regulated work

Proprietary code, internal docs, M&A drafts

Day-to-day drafting, research, light coding

Personal, low-stakes use

AI Assistants

AI Coding Tools

Voice & Dictation

Privacy Policy Quick Read: Does Each AI Tool Train on Your Data?

AI Assistants

Does ChatGPT train on my data?

Does Claude train on my data?

Does Gemini train on my data?

Does Perplexity train on my data?

AI Coding Tools

Does Cursor train on my code?

Does GitHub Copilot train on my code?

Does Windsurf (Codeium) train on my code?

Does Cline train on my code?

Voice & Dictation

Does Voibe train on my voice data?

Does Wispr Flow train on my voice data?

Does Superwhisper train on my voice data?

Does MacWhisper train on my voice data?

Does Aqua Voice train on my voice data?

Does Typeless train on my voice data?

Does Apple Dictation train on my voice data?

Frequently Asked Questions