Limited time: 25% off all plans β€” codeView pricing
Voibe Logovoibe Resources

AI Tool Privacy Tracker

What every major AI tool actually does with your data. Training behavior, retention, and on-device support β€” verified against primary sources, with a separate row for consumer and business tiers because the answer is different.

Last updated April 27, 202615 tools trackedNext review May 27, 2026

Recent Changes

Dated policy shifts that changed what a tool does with your data. Each entry is linked to a primary source.

  1. GitHub Copilot

    GitHub began using Free / Pro / Pro+ user interaction data, including code snippets, to train AI models by default. Existing opt-outs are honored. Business and Enterprise are unaffected.

    Source: github.blog/news-insights
  2. OpenAI / ChatGPT

    OpenAI's obligation to indefinitely retain consumer ChatGPT and API content, imposed under the NYT litigation order, ended. Standard 30-day retention practices resumed. Limited April–September 2025 data is still preserved under the order.

    Source: openai.com
  3. Anthropic / Claude

    Anthropic shifted consumer Claude (Free, Pro, Max) from "not used for training" to a user-choice model. Users who opt in have data retained up to 5 years (vs. previous 30 days). Existing-user choice deadline: October 8, 2025.

    Source: anthropic.com/news

How this tracker is maintained

  • Each cell is verified against the vendor's own privacy policy, terms of service, or technical documentation. We don't paraphrase what the policy "probably" says β€” only what it actually states.
  • We separate consumer and business / API tiers because they operate under different contracts. The tier filter on each table reflects this; conflating them is the most common error in third-party comparisons.
  • "Trains on your data?" answers what happens by default at sign-up. The note under each chip describes the toggle, if one exists.
  • Each tier carries four independent scores on a 0–25 scale: Training, Retention, On-device, and Track record. The four axes sum to a 0–100 composite. We publish per-axis scores because the dimensions trade off differently for different use cases, and the composite for readers who need a single answer.
  • Track record reflects documented incidents, breaches, and unfavorable policy changes β€” sourced from the vendor's own announcements, court filings, and reporting that we link from this page or from related Voibe Resources articles. We don't penalize a vendor for a past mistake they have visibly fixed; we do penalize opaque reverses and silent expansions of data collection.
  • The matrix is reviewed on a roughly monthly cadence and updated immediately whenever a vendor announces a policy change. Each row carries its own Last verified date.
  • Errors get fixed fast: email hi@getvoibe.com with a primary-source link and we'll update the row.

Spotted an error? hi@getvoibe.com. Include the cell and a primary-source link and we'll update on the next pass.

Legend:Yes (default)β€” Trains by default with no opt-out pathYes (opt-out)β€” Trains by default; user can disable in settingsUser choiceβ€” User must actively choose during signup or in settingsNoβ€” Does not train on user data, period

Score legend β€” per axis (0–25)

22–25Strong β€” Architectural or contractual guarantee
17–21Solid β€” Default-off training, short retention, or partial on-device
11–16Mixed β€” User must take action, or policy has caveats
6–10Weak β€” Unfavorable default, opt-out only
1–5Poor β€” No opt-out path, or indefinite retention
0Unclear β€” Policy does not address this dimension

Total score (0–100, sum of four axes)

85–100Excellent
70–84Strong
55–69Adequate
40–54Weak
0–39Poor

Privacy scoreboard

One row per tool, showing its best-scoring tier in the selected tier group. Each axis is scored 0–25; the total adds the four axes for a 0–100 composite. Sorted by total by default. Tools with multiple sub-tiers in the same group (e.g., Superwhisper’s on-device vs cloud modes) are detailed in the matrix tables below.

Sort
#ToolCategoryPlan tierTrainingRetentionOn-deviceTrack recordTotal
1Voice & DictationAll plans2525252398/100Excellent
2Voice & DictationOn-device modes (Fast / Nano / Standard / Parakeet β€” Free + Pro)2523251891/100Excellent
3Voice & DictationPro (Gumroad) / Whisper Transcription (App Store)2320232288/100Excellent
4Coding ToolsOpen-source extension (BYOK)2325132384/100Strong
5Voice & DictationmacOS / iOS (Apple Silicon, supported languages)2013181869/100Adequate
6AI AssistantsFree / Pro / Max151831551/100Weak
7AI AssistantsFree / Plus / Pro101831546/100Weak
8Voice & DictationFree / Pro15233546/100Weak
9Coding ToolsIndividual default (Privacy Mode OFF)83131337/100Poor
10Voice & DictationFree (Privacy Mode OFF, default)20331036/100Poor
11AI AssistantsFree / Gemini Advanced101031033/100Poor
12Coding ToolsIndividual (no ZDR, default)10531331/100Poor
13Coding ToolsFree / Pro / Pro+1053826/100Poor
14Voice & DictationFree / Pro / iOS Pro8031526/100Poor
15AI AssistantsFree / Pro / Max833822/100Poor

Use case fit: which score is right for what?

A composite score is only useful if you know what threshold to look for. The table below maps four common buyer scenarios to the minimum score we recommend, the reasoning, and the tools that currently meet that bar.

Healthcare, legal, financial β€” regulated work

85+

Patient records, attorney-client privileged content, financial data β€” anywhere a leak triggers regulator notification or contract liability.

Why this floor: Below 85, the vendor either retains data longer than 30 days, lacks a contractual training exclusion, or has a track-record incident in the past 24 months. Regulated work has no margin for any of those.

Watch out for: BAA / DPA availability β€” score doesn't reflect HIPAA contracts. A tool can score 85+ and still be unusable for PHI without a signed BAA. Always verify the contract before using a tool for regulated content.

Tools that meet the bar (business tier)

  • VoibeVoice & Dictation Β· All plans98/100Excellent
  • MacWhisperVoice & Dictation Β· Same posture (no separate enterprise tier)88/100Excellent

Proprietary code, internal docs, M&A drafts

70+

Sensitive but not strictly regulated. Strategy docs, source code, contract drafts, customer data without compliance overlay.

Why this floor: 70+ means contractual ZDR or short retention with a vendor that has a clean recent track record β€” you trust them not to retain or train, even if you'd still avoid pasting raw secrets.

Watch out for: Consumer-tier mistakes. Most consumer tiers score below 50 β€” make sure your team is on the business plan, not pasting internal docs into ChatGPT Free. The tier filter on each table makes this visible.

Tools that meet the bar (business tier)

Day-to-day drafting, research, light coding

55+

Public-facing content, general knowledge work, code that isn't a trade secret, prompts you wouldn't mind appearing in a leak.

Why this floor: 55+ means the tool is reasonably well-behaved by default or has a clear, working opt-out path that most users will actually flip.

Watch out for: Default settings. Many tools score 55+ only after the privacy toggle is on. Verify each user has actually flipped it β€” onboarding teams to a private-by-default workflow is more reliable than chasing settings.

Tools that meet the bar

  • VoibeVoice & Dictation Β· All plans98/100Excellent
  • SuperwhisperVoice & Dictation Β· On-device modes (Fast / Nano / Standard / Parakeet β€” Free + Pro)91/100Excellent
  • MacWhisperVoice & Dictation Β· Pro (Gumroad) / Whisper Transcription (App Store)88/100Excellent
  • Cline (open-source agent)Coding Tools Β· Open-source extension (BYOK)84/100Strong
  • Windsurf (formerly Codeium)Coding Tools Β· Teams / Enterprise (ZDR default ON)79/100Strong
  • Apple DictationVoice & Dictation Β· macOS / iOS (Apple Silicon, supported languages)69/100Adequate
  • CursorCoding Tools Β· Team / Enterprise (Privacy Mode ON, default)64/100Adequate
  • ChatGPT (OpenAI)AI Assistants Β· Team / Enterprise / API61/100Adequate

Personal, low-stakes use

Any

Notes to self, brainstorming, creative writing β€” content you wouldn't mind seeing in a leak.

Why this floor: Any tool works if you understand the tradeoff. Default consumer tiers of major assistants land in the 30–50 range; that's fine for non-sensitive prompts.

Watch out for: Voice input. Audio is uniquely sensitive β€” even casual dictation may capture identity-revealing details, ambient conversations, or addresses. The On-device score matters more for voice than for text.

Tools that meet the bar

  • VoibeVoice & Dictation Β· All plans98/100Excellent
  • SuperwhisperVoice & Dictation Β· On-device modes (Fast / Nano / Standard / Parakeet β€” Free + Pro)91/100Excellent
  • MacWhisperVoice & Dictation Β· Pro (Gumroad) / Whisper Transcription (App Store)88/100Excellent
  • Cline (open-source agent)Coding Tools Β· Open-source extension (BYOK)84/100Strong
  • Windsurf (formerly Codeium)Coding Tools Β· Teams / Enterprise (ZDR default ON)79/100Strong
  • Apple DictationVoice & Dictation Β· macOS / iOS (Apple Silicon, supported languages)69/100Adequate
  • CursorCoding Tools Β· Team / Enterprise (Privacy Mode ON, default)64/100Adequate
  • ChatGPT (OpenAI)AI Assistants Β· Team / Enterprise / API61/100Adequate

AI Assistants

Chatbots and search assistants. Consumer tiers vary the most β€” check whether your account is logged-in vs. logged-out, and whether you've reviewed your data settings since the last policy change.

ToolPlan tierData collectedTrains on your data?RetentionOn-deviceTrack recordTotalLast verifiedSource
Free / Plus / ProPrompts, outputs, uploaded files, usage, IP, device info, account infoYes (opt-out)

Off via Settings β†’ Data Controls β†’ "Improve the model for everyone." Temporary Chat is never used for training.

Training10/25
30 days after deletion. April–September 2025 data preserved due to NYT order; standard practice resumed Sept 26, 2025.
Retention18/25
No
On-device3/25

March 2023 chat-history bug; April–September 2025 NYT-mandated indefinite retention.

Track record15/25
46/100WeakApr 27, 2026
Free / Pro / MaxChats, coding sessions (when using Claude Code with consumer accounts), feedback (thumbs)User choice

Active choice required during signup or in Privacy Settings ("You can help improve Claude"). Off by default for users who decline. Policy changed August 2025.

Training15/25
30 days if declined. 5 years if enabled. Flagged conversations: 2–7 years for trust & safety.
Retention18/25
No
On-device3/25

August 2025 reversal: consumer Claude moved from 'never used for training' to user choice.

Track record15/25
51/100WeakApr 27, 2026
Free / Gemini AdvancedChats, files, photos, videos, screen content, account info, IP, device infoYes (opt-out)

Off via "Gemini Apps Activity" β†’ Off. Even when off, future chats are kept for 72 hours so Gemini can respond and process feedback.

Training10/25
18 months default (adjustable to 3 months / 36 months / never). Human-reviewed chats retained up to 3 years (disconnected from account).
Retention10/25
No
On-device3/25

Apps Activity 'off' still keeps chats 72h; human-reviewed conversations retained up to 3 years.

Track record10/25
33/100PoorApr 27, 2026
Free / Pro / MaxQueries, prompts, AI responses, usage, device infoYes (opt-out)

Off via Account Settings β†’ Preferences β†’ "AI Data Retention." Logged-out users are trained on by default with no opt-out path.

Training8/25
Threads kept until manually deleted. Account deletion processed within 30 days.
Retention3/25
No (Comet browser stores some data locally β€” separate policy)
On-device3/25

2024 reporting documented robots.txt evasion via undisclosed user-agent; logged-out users still trained on.

Track record8/25
22/100PoorApr 27, 2026

AI Coding Tools

IDE assistants and agents. Consumer defaults shifted in April 2026 (GitHub Copilot now trains on consumer interaction data by default). Most tools offer a Privacy / Zero-Data-Retention mode that flips the answer; check whether yours is on.

ToolPlan tierData collectedTrains on your data?RetentionOn-deviceTrack recordTotalLast verifiedSource
Individual default (Privacy Mode OFF)Code, prompts, editor actions, code snippetsYes (default)

Default for individual accounts ("Share Data" on). Used to improve Cursor's models. Toggle Privacy Mode ON to opt out β€” code never trained on, plaintext discarded after request.

Training8/25
Stored indefinitely (Share Data ON). Privacy Mode ON: plaintext discarded after request; cached files encrypted with client-generated keys.
Retention3/25
No (configurable to use local Ollama / LM Studio models, which bypass Privacy Mode entirely)
On-device13/25

Default 'Share Data' on for individuals; no documented incidents; transparent data-use page.

Track record13/25
37/100PoorApr 27, 2026
Free / Pro / Pro+Inputs, outputs, code snippets, associated contextYes (opt-out)

Policy changed April 24, 2026: GitHub now trains on consumer interaction data by default. Existing opt-outs honored. Toggle in Settings β†’ Privacy.

Training10/25
User Engagement Data: 2 years. Coding Agent session logs: lifetime of account. Private repo code at rest is NOT used for training; in-flight interaction data IS.
Retention5/25
No
On-device3/25

April 2026 reversal trains consumer interaction data by default; pending class action over training data.

Track record8/25
26/100PoorApr 27, 2026
Individual (no ZDR, default)Logs may contain code snippets and user trajectoriesYes (opt-out)

ZDR is opt-in for individuals β€” toggle in profile to enable. With ZDR ON, code submitted is never trained on.

Training10/25
With ZDR ON: in-memory for request lifetime, plus minutes-to-hours for prompt caching. Without ZDR: logs may persist.
Retention5/25
No
On-device3/25

2024 Codeium β†’ Windsurf rebrand; default-off ZDR for individuals; no documented incidents.

Track record13/25
31/100PoorApr 27, 2026
Open-source extension (BYOK)Cline operates no model server. Code goes only to your configured API provider (Anthropic, OpenAI, Bedrock, Gemini, etc.) and is governed by that provider's terms.No (by Cline)

Cline's stated principle: "Code never leaves your machine" toward Cline servers. Anonymous telemetry (features used, task completion) is opt-out via the "Cline Telemetry" setting. Code, file contents, command arguments, and conversation content are not collected by telemetry.

Training23/25
Cline retains nothing about your code. Provider retention applies (e.g., Anthropic API ZDR, OpenAI API 30 days).
Retention25/25
Partial β€” extension runs locally; inference happens at your chosen provider, or fully on-device if you configure Ollama / LM Studio.
On-device13/25

Open-source; no model server; no documented incidents.

Track record23/25
84/100StrongApr 27, 2026

Voice & Dictation

Speech-to-text and dictation tools. Voice input is uniquely sensitive β€” audio carries identity, biometric data, and ambient context β€” so the on-device column matters more than for text-only tools.

ToolPlan tierData collectedTrains on your data?RetentionOn-deviceTrack recordTotalLast verifiedSource
All plansNo audio or transcription leaves the device. Account holders: email (account auth) plus non-identifying usage analytics; crash reports exclude dictated content.No

"The Voibe application processes your voice entirely on your device. No audio is transmitted to our servers at any point."

Training25/25
Audio: not transmitted, not retained. Account email: kept while account is active.
Retention25/25
Yes (only mode) β€” Whisper models running on Apple Silicon Neural Engine
On-device25/25

New entrant; on-device-only architecture removes the surface for retention or training incidents.

Track record23/25
98/100ExcellentApr 27, 2026
Free (Privacy Mode OFF, default)Audio, transcripts, edits, optional Context Awareness (screen content from active app)Yes (opt-in)

After 2024 community backlash, training is now off by default and requires opt-in. Audio retained indefinitely; 30 days for data passed to third-party LLMs (OpenAI, Meta).

Training20/25
Indefinite for retained dictation data; 30 days for third-party LLM passthrough.
Retention3/25
No β€” transcription always happens in the cloud, even in Privacy Mode (zero-retention cloud, not local).
On-device3/25

2024 community backlash forced opt-in training and Privacy Mode; Free tier still indefinite by default.

Track record10/25
36/100PoorApr 27, 2026
On-device modes (Fast / Nano / Standard / Parakeet β€” Free + Pro)None β€” audio processed locally and never transmittedNo

"Your data is not retained on Superwhisper servers" and "not used for training AI models or any other machine learning purposes." Audio recordings are saved to local disk by default β€” opt out in settings.

Training25/25
N/A on servers. Local recordings persist until the user deletes them.
Retention23/25
Yes
On-device25/25

Stable privacy-first stance; cloud modes added without separate disclosure in the public privacy policy.

Track record18/25
91/100ExcellentApr 27, 2026
Cloud modes (Ultra transcription / Super Mode LLMs β€” Pro)Audio sent to Superwhisper's proxy infrastructureNo (per vendor)

Superwhisper says cloud audio is proxied through their infrastructure, third-party providers don't see user account or content, and there is no training or retention. Cloud-mode handling is not currently distinguished in the public privacy policy from on-device modes β€” verify the latest with the vendor before sensitive use.

Training18/25
Stated as not retained on servers; not separately documented for cloud modes.
Retention13/25
No
On-device3/25

Stable privacy-first stance; cloud modes added without separate disclosure in the public privacy policy.

Track record18/25
52/100WeakApr 27, 2026
Pro (Gumroad) / Whisper Transcription (App Store)On-device modes: none transmitted. App Store version discloses "Usage Data" and "Product Interaction" as Data Not Linked to You. Cloud Assistant or BYOK (OpenAI / ElevenLabs) features send audio to those providers under their terms.No (by MacWhisper)

MacWhisper does not train its own models on user audio. Cloud Assistant and BYOK integrations inherit the chosen provider's terms (e.g., OpenAI Whisper API, Anthropic / ElevenLabs).

Training23/25
On-device transcription: not retained. Cloud Assistant / BYOK: per third-party provider's terms.
Retention20/25
Yes (primary mode) β€” local Whisper models plus Apple Foundation Models for AI features. Cloud Assistant is opt-in for higher-quality transcription.
On-device23/25

Long-running indie tool; on-device by default; no documented incidents.

Track record22/25
88/100ExcellentApr 27, 2026
Free / Pro / iOS ProAudio inputs, technical data (IP, browser, OS, performance metrics), session metadata. With Privacy Mode disabled, "we may securely store transcript data on our servers."Yes (opt-out)

Privacy Mode toggle stops transcript storage on Aqua Voice servers; with it enabled, "transcript data is not collected" though session metadata may still be. The privacy policy does not explicitly state whether stored transcript data is used for AI training. SOC 2 Type II certified by Advantage Partners. No HIPAA BAA publicly advertised.

Training8/25
With Privacy Mode disabled: not specified in policy. With Privacy Mode enabled: transcripts not stored; session metadata (timestamps, device type, performance metrics) may be retained.
Retention0/25
No β€” cloud transcription
On-device3/25

SOC 2 Type II via Advantage Partners; privacy policy ambiguous on whether stored transcripts are used for AI training.

Track record15/25
26/100PoorApr 27, 2026
Free / ProAudio plus limited contextual information, processed on Typeless's cloud servers. Subprocessors include third-party LLM providers, analytics, and cloud infrastructure.No (per vendor)

Privacy policy: "Your data is never used to train these services and is configured for zero retention by the providers." Note: the November 2025 reverse-engineering analysis documented in our Typeless privacy issues investigation reported collection beyond what the public policy describes β€” verify against the current policy and subprocessor list before sensitive use.

Training15/25
Per privacy policy, audio + contextual information are "processed in real time on our cloud servers and immediately discarded once the result is returned to your device."
Retention23/25
No β€” cloud-processed in real time
On-device3/25

Nov 2025 reverse-engineering analysis documented collection (URLs, window-title metadata, broad permissions) beyond what the public policy describes.

Track record5/25
46/100WeakApr 27, 2026
macOS / iOS (Apple Silicon, supported languages)Audio inputs, plus contextual data (contacts, app names, etc.) when sent to serversOpt-in only

"Improve Siri & Dictation" must be enabled. Default at setup is to be asked.

Training20/25
If opted in: audio + transcripts kept under a rotating random ID for up to 6 months, dissociated and kept up to 2 years for improvement; reviewed subset retained beyond 2 years. If opted out: not retained for improvement.
Retention13/25
Yes (partially) β€” most languages on Apple Silicon process locally for general text fields (Notes, Mail, Messages). Server fallback applies to unsupported languages, search-box dictation, and some third-party Speech Recognition API uses.
On-device18/25

2019 Siri grading scandal led to opt-in for human review; otherwise privacy-forward.

Track record18/25
69/100AdequateApr 27, 2026

Privacy Policy Quick Read: Does Each AI Tool Train on Your Data?

For each of the 15 tools in the matrix above, here is what the vendor's own privacy policy says about training, retention, and on-device support β€” quoted verbatim where the policy text supports a clean citation. Each entry links to the primary source we verified against on .

AI Assistants

Does ChatGPT train on my data?

Yes, by default β€” opt-out available.

ChatGPT's consumer plans (Free, Plus, Pro) train on user prompts, outputs, and uploaded files by default. To opt out, navigate to Settings β†’ Data Controls and disable "Improve the model for everyone." Conversations are retained for 30 days after deletion. Temporary Chat is never used for training. ChatGPT Team, Enterprise, and API plans are explicitly excluded from training under OpenAI's enterprise terms β€” API users can optionally opt in via Playground feedback. Limited April–September 2025 data is preserved due to the NYT litigation order; OpenAI's standard 30-day retention practices resumed September 26, 2025.

Primary source: OpenAI privacy policy

Does Claude train on my data?

User choice required (since Aug 2025).

As of August 28, 2025, Anthropic shifted Claude's consumer plans (Free, Pro, Max) from "not used for training" to a user-choice model. New users must actively choose during signup whether to share data for training; existing users had until October 8, 2025. Users who opt in have their data retained for up to 5 years; users who decline keep the previous 30-day retention window. Flagged conversations are retained 2–7 years for trust & safety review. Claude for Work, the Claude API, Amazon Bedrock, and Google Vertex AI are all contractually excluded from training under Anthropic's Commercial Terms.

Primary source: Anthropic Aug 2025 update

Does Gemini train on my data?

Yes, by default β€” opt-out via "Gemini Apps Activity."

Free Gemini and Gemini Advanced (consumer) train on user conversations by default. Per Google's documentation, when Gemini Apps Activity is on, "Google uses your activity to provide, develop, and improve its services (including training generative AI models)." To opt out, set Apps Activity to OFF β€” but even when off, future chats are saved for 72 hours so Gemini can respond and process feedback. Default retention is 18 months, adjustable to 3 months, 36 months, or never. Human-reviewed conversations are kept up to 3 years (disconnected from your Google Account). Vertex AI customer data is contractually excluded from training: "Google won't use your data to train or fine-tune any AI/ML models without your prior permission or instruction."

Primary source: Gemini Apps Activity controls

Does Perplexity train on my data?

Yes, by default β€” opt-out for logged-in users only.

Perplexity trains on user queries, prompts, and AI responses by default for Free, Pro, and Max plans. The "AI Data Retention" toggle in Account Settings β†’ Preferences disables this. Logged-out users are trained on by default with no opt-out path β€” sign in to gain control. Threads are retained until manually deleted; account deletion is processed within 30 days. The Sonar API offers Zero Data Retention with prompts and responses never stored. Third-party providers (OpenAI, Anthropic) are contractually prohibited from training on Perplexity's API data. Enterprise file uploads are deleted after 7 days.

Primary source: Perplexity data collection policy

AI Coding Tools

Does Cursor train on my code?

Yes, by default for individuals β€” Privacy Mode opt-out.

For individual accounts, Cursor's "Share Data" mode is enabled by default, sending code, prompts, editor actions, and code snippets to Cursor for model improvement. Toggling Privacy Mode ON prevents training and discards plaintext after each request β€” cached files are encrypted with client-generated keys, with the encryption keys existing on Cursor's servers only for the duration of each request. Team and Enterprise accounts default to Privacy Mode ON, with zero-data-retention agreements with OpenAI, Anthropic, Google, xAI, Fireworks, Baseten, and Together. The strictest tier, Privacy Mode (Legacy), guarantees no code is stored at all, by Cursor or any third party. Cursor can also be configured to use local Ollama or LM Studio models, which bypass Privacy Mode entirely.

Primary source: Cursor data use

Does GitHub Copilot train on my code?

Yes, by default for consumer plans β€” opt-out as of April 24, 2026.

On April 24, 2026, GitHub began using Free, Pro, and Pro+ user interaction data β€” including code snippets β€” to train AI models by default. Existing opt-outs are honored. To disable training going forward, go to Settings β†’ Privacy. User Engagement Data is retained for 2 years; Coding Agent session logs persist for the lifetime of the account. Private repository code at rest is NOT used for training, but in-flight interaction data IS. Business and Enterprise plans are explicitly prohibited from being used for training under GitHub's agreements: subscription Prompts and Suggestions are retained 28 days, and User Engagement Data 2 years.

Primary source: April 2026 policy change

Does Windsurf (Codeium) train on my code?

Yes for individuals by default β€” Zero Data Retention opt-in available.

Windsurf (formerly Codeium) trains on individual user code by default β€” without zero-data-retention enabled, logs may contain code snippets and user trajectories. Individuals can toggle ZDR on in their profile to prevent training; with ZDR on, "the code data submitted by zero-data retention mode users will never be trained on," code is never serialized in plaintext on Windsurf's servers, and is held only in-memory for the request lifetime (plus minutes-to-hours for prompt caching). Teams and Enterprise plans default to ZDR ON. The Enterprise Self-hosted tier deploys via Docker Compose or Helm Charts inside the customer's firewall β€” no traffic leaves customer infrastructure.

Primary source: Windsurf security

Does Cline train on my code?

No β€” Cline operates no model server. Privacy depends on your chosen API provider.

Cline is an open-source VS Code extension that operates no model server of its own. User code is sent only to whichever API provider you configure (Anthropic, OpenAI, AWS Bedrock, Google Gemini, Cerebras, Groq, etc.) and is governed by that provider's terms. Cline's stated principle: "Code never leaves your machine" toward Cline servers. Anonymous telemetry (features used, task completion rates) is collected but can be disabled via the Cline Telemetry setting. Code, file contents, command arguments, and conversation content are explicitly NOT collected by telemetry. For fully on-device use, configure Cline with a local Ollama or LM Studio model.

Primary source: Cline telemetry docs

Voice & Dictation

Does Voibe train on my voice data?

No β€” audio never leaves the device.

Voibe processes audio entirely on your Mac using OpenAI Whisper models running on Apple Silicon's Neural Engine. Per Voibe's privacy policy: "The Voibe application processes your voice entirely on your device. No audio is transmitted to our servers at any point" and "Your dictated content never leaves your Mac and we have no access to it." Because audio never crosses the network, there is no training to opt out of. Account holders provide an email (for authentication) and non-identifying usage analytics; crash reports exclude dictated content. The Free plan does not require an account at all.

Primary source: Voibe privacy policy

Does Wispr Flow train on my voice data?

Off by default since 2024 backlash β€” opt-in for training.

After 2024 community backlash, Wispr Flow shifted training to opt-in. Privacy Mode is OFF by default for Free users, meaning audio, transcripts, edits, and optional Context Awareness (screenshots of the active app's screen) are retained indefinitely. Data passed to third-party LLM providers (OpenAI, Meta) is retained for 30 days. Enterprise plans default to Privacy Mode ON with zero data retention by Wispr or any third party β€” audio is processed and immediately discarded after transcription. A Business Associate Agreement is available for Enterprise; once signed, Privacy Mode locks irreversibly. Transcription always happens in the cloud; even Privacy Mode is "zero-retention cloud," not local processing.

Primary source: Wispr Flow privacy policy

Does Superwhisper train on my voice data?

No β€” verbatim from policy.

Superwhisper's privacy policy states explicitly: "Your data is not retained on Superwhisper servers" and "not used for training AI models or any other machine learning purposes." On-device modes (Fast, Nano, Standard Whisper, Parakeet β€” available on the Free plan and within Pro) process audio entirely locally; nothing is transmitted. Cloud modes (Ultra transcription, Super Mode LLMs β€” Pro tier) proxy audio through Superwhisper's infrastructure with no retention. One caveat: audio recordings are saved to local disk by default. Opt out in settings if local audio retention is a concern. Note: the privacy policy does not currently distinguish between on-device and cloud modes β€” verify cloud-mode specifics with the vendor before sensitive use.

Primary source: Superwhisper privacy

Does MacWhisper train on my voice data?

No β€” primarily on-device, with optional cloud + BYOK paths.

MacWhisper does not train its own models on user audio. The on-device transcription path uses local Whisper models that you can download for offline use; Apple Foundation Models also run on-device for AI features. MacWhisper's optional "Assistant" cloud transcription service and BYOK integrations (OpenAI Whisper API, ElevenLabs) inherit those providers' terms when used. The App Store version's privacy disclosure shows only "Usage Data" and "Product Interaction" as Data Not Linked to You. There is no separate enterprise tier; the data-handling architecture is identical for individuals and bulk-licensing customers.

Primary source: App Store listing

Does Aqua Voice train on my voice data?

Privacy policy does not explicitly address training β€” opt-out via Privacy Mode.

Aqua Voice's privacy policy does not explicitly state whether stored data is used for AI training. With Privacy Mode disabled, "we may securely store transcript data on our servers"; with Privacy Mode enabled, "transcript data is not collected" though session metadata (timestamps, device type, performance metrics) may still be. Aqua Voice is SOC 2 Type II certified by Advantage Partners. Teams and Enterprise plans support an org-wide Privacy Mode that applies the same protections across an entire organization. No HIPAA Business Associate Agreement is publicly advertised. Audio is cloud-processed; there is no on-device option.

Primary source: Aqua Voice privacy policy

Does Typeless train on my voice data?

No, per the published privacy policy β€” but verify the architecture.

Typeless's privacy policy states: "Your data is never used to train these services and is configured for zero retention by the providers." Audio plus contextual information is "processed in real time on our cloud servers and immediately discarded once the result is returned to your device." Free and Pro tiers receive the same data-handling treatment. However, a November 2025 reverse-engineering analysis (covered in our Typeless privacy issues investigation) reported collection beyond what the published policy describes β€” including URL capture, window-title metadata via the macOS accessibility API, and broad permission requests. Verify the current subprocessor list at trust.typeless.com/subprocessors before relying on Typeless for sensitive content.

Primary source: Typeless privacy policy

Does Apple Dictation train on my voice data?

Only if you opt in via "Improve Siri & Dictation."

Apple Dictation only uses your audio to improve its models if you have explicitly enabled "Improve Siri & Dictation" β€” the default at setup is to be asked. If opted in, audio and transcripts are retained under a rotating random ID for up to 6 months, then dissociated and kept for up to 2 years for improvement; a reviewed subset is retained beyond 2 years. If opted out, recordings are not retained for improvement. On Apple Silicon Macs running modern macOS or iOS, most languages process locally for general text fields (Notes, Mail, Messages). Server-side fallback applies to unsupported languages, search-box dictation, and some third-party Speech Recognition API uses. Apple does not sign a Business Associate Agreement for consumer Dictation, so it is not HIPAA-compliant.

Primary source: Ask Siri & Dictation policy

Frequently Asked Questions

What does "on-device" actually mean?
On-device means a tool can complete its core workflow without sending your input to a vendor's servers. For dictation, that means audio is captured, transcribed, and discarded entirely on your computer β€” nothing leaves the machine. Most AI assistants and coding tools are not on-device by default: they transmit your prompts and code to a cloud model, even if the vendor doesn't retain or train on it. "Partially on-device" means parts of the workflow are local but specific cases (unsupported languages, agentic operations, large models) fall back to the cloud. Apple Dictation and Cline (when paired with a local Ollama or LM Studio model) are examples of partial on-device.
Why does the same tool show different answers for Free vs Business?
Consumer and business tiers operate under separate contracts. Most major AI vendors train on consumer data (or did until very recently) and explicitly exclude business / API / enterprise data from training under their commercial agreements. The two tiers can use the same underlying model but with different data-handling guarantees. Conflating the two is the most common error in third-party comparison articles. The tier filter at the top of each table separates them so you can answer either question independently.
Can a tool "unlearn" my data after training?
Practically, no. Once a model has been trained on a piece of data, the parameters reflect that training and cannot be cleanly reverted on a per-record basis. Vendors offering deletion typically delete the conversation record but cannot remove its influence on the model that has already absorbed it. This is why the relevant question is "will it be used for training in the first place," not "can I delete it later." Pages like this one focus on the training question because the deletion question rarely changes the outcome for already-trained models.
How often is this updated?
Each row carries a Last verified date. We re-check every cell against its primary source on a roughly monthly cadence, plus immediately whenever a vendor announces a policy change. The Recent Changes timeline at the top of the page lists every dated change we have logged. If you find an outdated cell or a missing change, email hi@getvoibe.com β€” we'll update and credit the report.
How is each tool scored?
Every tier carries four independent scores on a 0–25 scale that sum to a 0–100 composite: Training (does the vendor train on your data by default), Retention (how briefly is data kept), On-device (can the workflow run without sending data to the vendor), and Track record (the vendor's documented incidents, breaches, and unfavorable policy changes). We separate the axes because a tool with a strong contract but a weak track record isn't strictly better or worse than the inverse β€” but we also publish the composite total because most readers ultimately want a single answer. Per-axis buckets: 22–25 architectural or contractual guarantee, 17–21 default-off or short retention, 11–16 user must take action or policy has caveats, 6–10 unfavorable default with opt-out, 1–5 no opt-out or indefinite retention, 0 the policy does not address that dimension. Composite buckets: 85+ Excellent, 70–84 Strong, 55–69 Adequate, 40–54 Weak, below 40 Poor.
What score should I look for?
The right floor depends on the work. For regulated industries (healthcare, legal, financial) or anything that triggers regulator notification on leak, look for 85+ in the business tier and verify the vendor signs a BAA or DPA. For sensitive but unregulated business work β€” proprietary code, internal docs, M&A drafts β€” 70+ is the floor; below that, you are relying on opt-out toggles that team members may not have flipped. For day-to-day drafting, research, and light coding on non-secret content, 55+ is fine. For personal low-stakes use (notes to self, brainstorming) any score works as long as you understand what the tool retains. The Use case fit section on this page lists the tools that meet each threshold.
How do I report an error or a missing tool?
Email hi@getvoibe.com with the tool name, the cell you think is wrong, and a primary-source link. We'll verify and update on the next pass. Tool requests are welcome, but to be added we need a vendor-published privacy or data-handling page that we can cite β€” marketing claims aren't enough.

This tracker is maintained by the team at Voibe. We built it because privacy is the central design constraint of our product, and we kept being asked these questions. Voibe is one of the tools listed β€” the methodology is the same for every row.

Related reading on this site: Is Wispr Flow safe? Β· Typeless privacy issues Β· Apple Dictation privacy Β· Voice data privacy Β· Cloud vs local dictation Β· HIPAA dictation.