Voice Recognition Software and GDPR: Why Cloud Speech Recognition Is a Legal Risk
At a Glance: Cloud-based dictation software transmits voice data to external servers — a problem for GDPR compliance and especially for professionals bound by confidentiality under §203 StGB. Voice recordings may contain biometric data and are therefore subject to the strict requirements of Art. 9 GDPR. Local offline solutions completely avoid these risks because no data transmission takes place.
You’re dictating a legal brief for a client. Confidential details, case numbers, names — everything flows via speech into your computer. But where exactly does it go? For most dictation solutions, the answer is: to a server in the cloud. Often in the US, sometimes in Ireland, rarely in Germany. For professions bound by confidentiality — lawyers, tax advisors, doctors — this isn’t just a data protection issue. It’s potentially criminal. This article explains why voice recognition software and GDPR create tension, what specific risks cloud solutions entail, and what alternatives exist.
What Happens to Your Voice Data in the Cloud?
Cloud-based dictation software transmits your audio data to an external server, where an AI model converts speech to text and sends the result back. In this process, your voice data passes through multiple network nodes, is processed on the provider’s server, and is frequently cached — a procedure with far-reaching data protection consequences.
The problem begins with the transmission itself. According to a SkyScribe study (2026), some transcription apps upload audio data to external servers even before user consent — via SDKs that transmit data as soon as the app starts. This mechanism is invisible to the user: the app only requests microphone access while audio data is forwarded to cloud models in the background.
Major cloud providers like Google, Apple, and Microsoft process voice data on their servers by default. With Apple’s Siri and Google’s speech recognition, voice recognition doesn’t work at all when WLAN is disabled — a clear sign that processing doesn’t occur locally. This may be acceptable for private use. For professional users with confidentiality obligations, it’s a compliance problem.
There’s also the question of data storage. Many cloud providers reserve the right in their terms of use to use audio data to improve their models. Even when immediate deletion is promised, transparent proof that data is actually completely removed is often lacking. Without clear deletion periods and verifiable processes, there remains a residual risk that your confidential dictations remain on foreign servers.
A frequently underestimated point: metadata is also worth protecting. Even when the audio stream is transmitted encrypted, timestamps, IP addresses, device information, and usage patterns can allow conclusions about your work. In a law firm, this metadata reveals, for example, when work is done on which case — information that could be valuable to opposing parties or competitors.

What GDPR Requirements Apply to Dictation Software?
Dictation software that processes voice data is fully subject to GDPR, as audio data constitutes personal data. It contains the voice of an identifiable person and often enough also content-sensitive information about third parties. Whether a DPA is required depends on whether processing takes place locally or on external servers.
The central GDPR requirements can be divided into four areas:
Legal basis under Art. 6 GDPR. Every processing operation needs a legal basis. For cloud dictation software, usually Art. 6 para. 1 lit. f (legitimate interest) or lit. a (consent) applies. Both are problematic in professional contexts: legitimate interest must be weighed against the rights of data subjects, and obtaining valid consent from all persons mentioned in a dictation is practically hardly feasible.
Data processing agreement under Art. 28 GDPR. As soon as a cloud provider processes your voice data on its servers, data processing occurs. A DPA (Data Processing Agreement) is mandatory. It must regulate the subject matter, duration, nature and purpose of processing, the type of personal data, and the processor’s obligations. If the DPA is missing, using the software already violates GDPR — regardless of whether a data protection incident actually occurs.
Third country transfer. Many cloud dictation services process data in the US or other third countries. Since the ECJ’s Schrems II ruling (2020), the transfer of personal data to the US is subject to strict requirements. The current EU-US Data Privacy Framework provides a foundation but is already under legal criticism again. According to the DLA Piper study (2025), GDPR fines of 1.2 billion euros were imposed across Europe in 2024 alone — third country transfers were among the most common violations.
Data Protection Impact Assessment (DPIA). When processing is likely to result in a high risk to the rights of natural persons — such as with biometric data or systematic monitoring — a DPIA must be conducted under Art. 35 GDPR. Voice data may meet this criterion, especially when processed on a large scale or systematically.
The crucial point: all these requirements only apply when data transmission to third parties actually takes place. With purely local dictation software that doesn’t send data to external servers, the entire compliance effort is eliminated. There’s no data processing, no third country transfer, and therefore no obligation for a DPIA — GDPR compliance results from the technical architecture itself.
| Requirement | Cloud Dictation Software | Local Dictation Software |
|---|---|---|
| Legal basis (Art. 6) | Required, complex | Not applicable (no data to third parties) |
| DPA (Art. 28) | Mandatory | Not necessary |
| Third country transfer | Common (US servers) | Excluded |
| DPIA (Art. 35) | Often required | Usually not necessary |
| Technical measures | Dependent on provider | Full control |
Why is speech data particularly sensitive?
Voice recordings can contain biometric data within the meaning of the GDPR because they capture a person’s physical and behavioral characteristics — voice frequency, speech rhythm, articulation patterns. Article 4 No. 14 GDPR defines biometric data as personal data obtained through special technical procedures that can uniquely identify a person.
When speech data is used for unique identification, the strict processing prohibition under Article 9(1) GDPR applies. This prohibition only allows ten narrowly defined exceptions — such as explicit consent. For cloud dictation services, this means: Even if a DPA exists and the data is processed in the EU, processing may violate Article 9 if the provider uses or could use the audio data for voice biometrics.
According to the position paper of the Data Protection Conference (DSK, 2019), the suitability of biometric data for unique identification must always be considered in risk assessment — even if current processing does not aim at identification. The mere possibility is sufficient to justify enhanced protective measures. This also applies to cases where the cloud provider would technically be able to create voice profiles — even if they are not currently doing so.
Another aspect: Speech data regularly contains substantive information about third parties. When a lawyer creates a dictation about a client case, it contains personal data of the client, the opposing party, and possibly witnesses. These individuals have neither consented to the processing nor can they control it. The situation is similar in the tax advisory context: Dictations about tax cases contain income data, family relationships, and asset information of clients — all highly sensitive data that ends up on a server with a cloud provider that neither the tax advisor nor the client controls.
According to TU Darmstadt and Rosenheim University of Applied Sciences, cloud-based speech recognition systems pose a significant risk because the transmitted recordings contain both biometric and confidential information — and could be misused, for example for so-called “fake recordings” (authentic-sounding, artificially generated voice recordings). This risk is completely eliminated with local processing, as the audio data never leaves the computer.
What does Section 203 of the German Criminal Code mean for the use of dictation software?
Section 203 of the German Criminal Code prohibits professional confidentiality holders — including lawyers, tax advisors, auditors, and doctors — from unauthorized disclosure of private secrets and punishes violations with imprisonment up to one year or a fine. Anyone who transmits client or patient data via cloud dictation software to third-party servers risks a criminally relevant breach of confidentiality.
The legal consequences go beyond fines. In cases of intent to enrich, up to two years imprisonment threatens (Section 203(6) German Criminal Code). Additionally, there are professional law consequences: Bar court proceedings, withdrawal of license, damage claims from affected clients. Wikipedia describes violations of Section 203 German Criminal Code as a “mass crime” that is “hard to beat” in its frequency — however, serious prosecution rarely takes place. This should not be understood as an all-clear: In an emergency — such as a data breach at the cloud provider — the question of duty of care becomes the decisive point.
Since the 2017 reform, Section 203(3) German Criminal Code allows the involvement of external service providers as “other cooperating persons.” However, the prerequisite is that the professional confidentiality holder carefully selects the service provider and obligates them to confidentiality. These cooperating persons are in turn included in the criminal liability under Section 203 — they also become criminally liable if they disclose a secret that became known to them in the course of their activity. With cloud providers based in the USA, it is questionable whether this requirement can be met — especially in light of the surveillance laws there such as FISA Section 702 and the CLOUD Act, which enable US authorities access to data even when stored in the EU.
IT law specialist Andreas Nörr put it succinctly in an interview with Future-Law (2026): Many lawyers advise their clients on data protection but use unsafe consumer tools themselves for confidential dictations. Using Siri or Google voice input for client dictations is a blatant contradiction to professional duties. He sees the solution in professional tools that process data locally or in certified European data centers — consumer products have no place in law firm environments.

What specific risks exist with cloud dictation software?
Cloud-based dictation software poses significant technical, legal, economic, and reputational risks that can reinforce each other. A single data breach at the cloud provider can simultaneously trigger a GDPR fine, a criminal investigation under §203 StGB, civil damages claims, and lasting reputational damage with clients and patients.
Technical risks. Audio data is transmitted to the server via the internet. Despite HTTPS encryption, residual risks exist with compromised certificates or man-in-the-middle attacks. On the server itself, data is decrypted and processed — whoever has server access has access to your dictations. According to the Cloud Monitor by KPMG and Bitkom Research, cloud threats are steadily increasing, especially for mid-sized companies, particularly through data theft and industrial espionage. Particularly insidious: some apps transfer audio via SDKs before users have even actively dictated anything.
Legal risks. Besides GDPR fines (up to 20 million euros or 4% of global annual revenue according to Art. 83 GDPR) and criminal liability under §203 StGB, civil damages claims also threaten. Art. 82 GDPR grants affected parties a right to compensation for GDPR violations — including for intangible damages. In 2025, the BfDI alone imposed a 45 million euro fine on Vodafone for inadequate data processing agreements (GDPR Portal, 2026). In the same year, CNIL fined Google 325 million euros and Shein 150 million euros.
Economic risks. Cloud dictation software incurs ongoing costs: subscriptions, DPA management, regular compliance audits, DPIA creation. Dragon Anywhere costs between 20 and 30 euros per user monthly as a subscription model with a minimum 12-month term. Additional costs include legal review of data protection compliance, which must be repeated with every provider change.
Reputational risks. A data breach at a cloud provider potentially affects all users simultaneously. For law firms and practices, such an incident can be existentially threatening. The average number of reported data breaches rose to 363 reports per day in Europe according to DLA Piper (2025). In 2025, German authorities were notified of 10,259 data breaches — an increase from the previous year’s 8,623 reports (GDPR Portal, 2026).
| Risk Category | Cloud Solution | Local Solution |
|---|---|---|
| Data leakage during transmission | Possible | Excluded |
| Server-side third-party access | Possible (including US authorities) | Excluded |
| GDPR fine | Up to 20 million € / 4% revenue | Risk minimal |
| §203 StGB criminal liability | Yes, for professional confidentiality holders | No |
| Ongoing compliance costs | DPA, DPIA, audits | None |
How does local speech recognition work as an alternative?
Local speech recognition processes audio data entirely on the user’s own computer, without internet connection and without transmission to external servers. The AI model runs directly on local hardware, meaning no third party is involved in the processing chain, no data processing agreement is needed, and GDPR compliance is architecturally guaranteed.
The technological breakthrough that makes this possible is called OpenAI Whisper. This open-source model was trained on 680,000 hours of multilingual audio data and achieves a Word Error Rate (WER) of about 5.8% for German — comparable to commercial cloud services. Crucially: Whisper runs completely on local hardware. Modern CPUs process dictations in near real-time, with GPU support (Apple Metal on Mac, NVIDIA CUDA on Windows) even significantly faster.
Through model quantization — converting 32-bit floating-point weights into smaller formats — the model shrinks to a manageable size without substantially affecting accuracy. According to MLCommons (2025), the Whisper reference implementation achieves 97.93% Word Accuracy and reduced the error rate by over 72% compared to the previous MLPerf ASR model. The MIT license allows unrestricted commercial use — a decisive advantage over proprietary cloud services that lock you into a single provider.
For GDPR compliance, the result is clear: if no speech data leaves the computer, there is no data processing, no third-country transfer, and no cloud provider you need to trust. The entire compliance complexity disappears. For §203 StGB, the situation is equally clear: data that never leaves the computer cannot be “disclosed.”
Diktly uses exactly this approach. The software processes speech entirely locally via the Whisper model. No internet, no cloud, no server connection — not even for update checks. For professional confidentiality holders under §203 StGB, this is the safest way: data stays on the computer, confidentiality is technically guaranteed.
A practical advantage often overlooked: local speech recognition works even where there’s no internet — in court, at client meetings, on trains, or abroad. While cloud solutions fail with poor connections or struggle with high latency, local software works reliably and without delay. For tax advisors during peak tax return season and lawyers with tight deadlines, this isn’t a comfort issue but a productivity factor.
What requirements should GDPR-compliant dictation software meet?
GDPR-compliant dictation software must ensure that voice data is not transferred to third parties without legal basis and that users maintain full control over their audio data at all times. The term “GDPR-compliant” is not a protected designation — therefore, specifically examine how the software handles your data.
The following criteria help with evaluation:
1. Data processing. Where does speech recognition take place — on your device or on an external server? Test this by disconnecting your internet connection and starting dictation. If the software works offline, processing is indeed local. This test also exposes providers who advertise “local AI” but actually rely on cloud processing.
2. Network communication. Check whether the software transmits data in the background. Some apps advertise “local AI” but still send telemetry data, usage statistics, or model updates over the network. According to the SkyScribe analysis (2026), some transcription apps redirect recordings to cloud models through background processes, even though they only request microphone access. Pay particular attention to apps that establish a network connection at startup, even though they shouldn’t need internet for dictation functionality.
3. DPA and documentation. If the software uses cloud components: Does the provider offer a complete Data Processing Agreement (DPA)? Where are the servers located? In which country is the provider based? Which subcontractors are involved? A missing or incomplete DPA is already a GDPR violation — regardless of whether data actually flows out.
4. Data storage and deletion. Are audio data or transcripts stored? If so, for how long? Are there clear deletion deadlines? Without specifications like “deletion after X days,” GDPR provides no guarantee that your data won’t be stored indefinitely. Also check whether the provider uses audio data to train their AI models — this is a separate processing purpose that requires its own legal basis.
5. Open-source transparency. Is the AI engine used open source? With open-source models like Whisper (MIT license), independent security audits can be conducted. Proprietary cloud models are black boxes — you must trust the provider without being able to verify their data processing.
What is the legal situation for cloud speech recognition in law firms?
Cloud-based speech recognition in law firms is not prohibited per se, but it requires compliance with both data protection and professional regulatory requirements. Besides GDPR, the Professional Code for Lawyers (BRAO) applies: §43a para. 2 BRAO mandates confidentiality about everything that becomes known in the course of professional practice.
Using cloud dictation software in law firms requires at minimum:
- A complete DPA with the cloud provider according to Art. 28 GDPR
- Assessment of whether the cloud provider can be qualified as an “other assisting person” within the meaning of §203 para. 3 German Criminal Code (StGB)
- Written commitment of the provider to confidentiality
- For US providers: Review of the impact of FISA Section 702, CLOUD Act, and EU-US Data Privacy Framework
- A documented Data Protection Impact Assessment according to Art. 35 GDPR
- Information for all data subjects (Art. 13/14 GDPR) — including clients whose data is dictated about
In practice, this means considerable administrative overhead. As Simon Reuvekamp, CTO at Meyer-Köring law firm and specialist for dictation systems, noted on legal-tech.de: Data protection doesn’t limit the advantages of speech recognition — provided there’s a well-thought-out concept behind it. However, anyone who simply uses Siri or Google voice input for client dictation is acting negligently. He recommends the safest alternative: using local speech recognition solutions that work entirely without network transmission — or at least solutions where data runs exclusively through the firm’s own encrypted server.
The simplest solution that bypasses all these requirements: Software where speech recognition runs completely locally on the law firm’s computer. No cloud provider, no DPA, no third-country transfer — and no vulnerability in the next data protection audit.
A practical example illustrates the difference: A solo practice using Dragon Anywhere must review the DPA with Nuance/Microsoft, assess third-country issues (according to the provider, servers are located in a German data center, but Microsoft as a US company is subject to the CLOUD Act), create a DPIA, inform all clients according to Art. 13 GDPR, and qualify the cloud provider as an assisting person according to §203 para. 3 StGB and commit them to confidentiality. The same practice with a local solution like Diktly doesn’t need to do any of this — GDPR compliance results from the architecture.
What Does GDPR-Compliant Dictation Software Cost in Comparison?
The total costs of GDPR-compliant dictation software consist of the purchase price and the often overlooked compliance costs — the latter can exceed the pure software price for cloud solutions by many times over. A fair comparison must therefore consider not only licensing fees but also DPA review, data protection impact assessment, and ongoing compliance documentation.
| Solution | Price | Model | Cloud / Local | DPA Required? |
|---|---|---|---|---|
| Siri / Google Speech Input | Free | — | Cloud | Yes (hardly possible) |
| Dragon Anywhere | ~€25/month (subscription) | Subscription, 12 months | Cloud (German data center) | Yes |
| Philips SpeechLive | ~€15–25/month | Subscription | Cloud (EU) | Yes |
| Whisper (Self-Hosted) | Free (Open Source) | — | Local | No |
| Diktly Basic | €14.99 one-time + VAT | One-time purchase | 100% local | No |
| Diktly Pro | €49.99 one-time + VAT | One-time purchase | 100% local | No |
Cloud solutions incur additional compliance costs: The initial review of a DPA by a data protection officer costs €500–2,000 depending on complexity. A data protection impact assessment ranges from €1,000–5,000. These costs occur per provider and with each provider change. For larger law firms with multiple practitioners, licensing costs multiply per user, while compliance costs only occur once — for small entities, however, compliance overhead clearly predominates.
The calculation is particularly stark for solopreneurs and small firms: A cloud subscription adds up to €600 or more over two years — plus compliance overhead. When you add the one-time DPA review (from €500) and a DPIA (from €1,000), the total costs for GDPR-compliant operation of a cloud dictation solution quickly reach over €2,000 in the first two years. A local solution like Diktly costs €14.99 one-time and requires no ongoing data protection management. Even the supposedly “free” consumer tools like Siri or Google Speech Input have a price: you pay with your data, and for professionals bound by confidentiality, the cost of criminal proceedings may be added in serious cases.
Frequently Asked Questions About Dictation Software and GDPR
Is dictation software GDPR-compliant?
This depends on the architecture. Cloud-based dictation software transmits voice data to external servers and requires a data processing agreement (DPA) under Art. 28 GDPR. Locally installed software like Diktly processes everything on the computer — no data transmission, no DPA needed, GDPR-compliant by design.
Why are cloud dictation solutions problematic from a data protection perspective?
Cloud services transmit audio data to external servers, often outside the EU. Voice recordings contain biometric features and potentially confidential content. Without proper legal basis, DPA, and data protection impact assessment, usage violates GDPR.
What does §203 German Criminal Code have to do with dictation software?
§203 German Criminal Code protects private secrets of professionals bound by confidentiality like lawyers, tax advisors, and doctors. Anyone who transmits client or patient data via cloud dictation software to third-party servers risks violating confidentiality obligations — punishable by up to one year imprisonment.
Do I need a DPA for dictation software?
Only if the software transmits data to external servers. For cloud-based solutions, a data processing agreement under Art. 28 GDPR is mandatory. For purely local software, this obligation does not apply since no data processing takes place.
Which dictation software works completely offline?
Diktly processes speech entirely locally on the computer and requires no internet connection. The classic Dragon desktop version also works locally. Most modern alternatives like Siri, Google Speech Input, or Dragon Anywhere use cloud servers.
Are voice data biometric data under GDPR?
Voice recordings can be biometric data within the meaning of Art. 4 No. 14 GDPR if they are processed for unique identification of a person. In this case, the strict processing prohibition under Art. 9 GDPR applies with its limited exceptions.
What does GDPR-compliant dictation software cost?
The price range is wide. Diktly Basic costs €14.99 one-time plus VAT. Cloud solutions like Dragon Anywhere often cost €20–30 monthly as a subscription. For cloud solutions, hidden costs for DPA management, data protection impact assessment, and compliance documentation are added.
Conclusion: Local Processing is the Safest Path to GDPR Compliance
The question is not whether cloud dictation software can be operated in GDPR compliance. With enough effort — DPA, DPIA, third-country review, §203 safeguards — this is theoretically possible. The question is whether this effort is worthwhile when there’s a simpler solution.
Local speech recognition eliminates the problem at its root. If no data leaves the computer, there is no data processor, no third-country transfer, and no confidentiality violation. Compliance is not laborious — it is architecturally given. Thanks to open-source models like Whisper, the recognition quality of local solutions today matches that of commercial cloud services — the often-cited quality advantage of the cloud is no longer an argument.
If you use dictation software as a lawyer, tax advisor, or doctor, check today: Where do your voice data end up? If the answer is “in the cloud,” it’s time for a change. Diktly processes everything locally, costs €14.99 one-time from, and requires not a single compliance process. Because the best data protection is when there’s nothing to protect — because the data never leaves your computer.