Resource guide

Audio Deepfakes: What They Are and How to Protect Yourself

AI can now clone anyone's voice from a short recording. Here is what audio deepfakes are, how scammers use them, and the one low-tech trick that stops a voice-clone attack cold.

Last updated June 18, 2026 1671-word guide Editor Ban the Bots

What is an audio deepfake?
How voice cloning works
Real-world harm: scams and fraud
Political manipulation: the Biden robocall
Can you spot an audio deepfake?
Detection tools that exist
How to protect yourself and your family
Frequently asked questions

What is an audio deepfake?

An audio deepfake is a fake voice recording created by artificial intelligence that sounds like a specific real person. The AI studies a sample of someone’s real speech—sometimes as little as three seconds—and generates new audio that mimics their voice, accent, pacing, and tone. The result can be nearly indistinguishable from the real person speaking.

Audio deepfakes are also called voice deepfakes, voice clones, or AI voice clones. They are a subset of the broader deepfake category, which covers AI-manipulated video and images as well. For background on the wider topic, see our guide on what deepfakes are.

The technology is not theoretical. Commercial services like ElevenLabs, Resemble AI, and others make voice cloning available to anyone with an internet connection. ElevenLabs’ 2023 transparency report confirmed the company processes millions of voice generations per month. Their terms prohibit impersonation, but ElevenLabs acknowledged it is technically impossible to prevent 100% of misuse.

How voice cloning works

Modern voice cloning works by training a neural network on audio samples of the target person, then using that model to synthesize new speech in their voice. The process has three steps that have all become dramatically faster and cheaper since 2022.

First, the AI ingests source audio—a clip from a YouTube video, a podcast appearance, a voicemail, or a social media story. Even a few seconds is enough for newer models. Second, the model extracts a voice fingerprint: the unique pattern of pitch, resonance, accent, and cadence that makes a voice recognizable. Third, you type any text and the model reads it back in the cloned voice.

The key non-obvious fact here: the source audio does not need your permission or knowledge. A scammer who has heard you speak publicly—in a video, on a podcast, at an event that was recorded—already has everything they need. Parents who post videos of their children on social media are inadvertently creating a voice library that bad actors can exploit.

Microsoft’s VALL-E model, published in a 2023 research paper, demonstrated that three seconds of audio was sufficient to clone a voice with high accuracy. Microsoft published a watermarking specification alongside the research to mitigate misuse, but watermarking is not yet standard industry practice.

Real-world harm: scams and fraud

Voice scams, including AI voice clones, generated over $1.1 billion in reported consumer losses in 2023—the highest category of fraud loss tracked by the FTC’s 2024 Consumer Sentinel report. That figure covers only reported cases; researchers estimate actual losses are several times higher because most fraud goes unreported.

The grandparent scam goes high-tech

The grandparent scam has existed for decades: a caller claims to be a grandchild in an emergency, needing bail money or medical funds wired immediately. Traditional versions used voice actors and relied on panic to prevent victims from thinking clearly. AI has made the scam far more convincing.

Now scammers clone the actual grandchild’s voice from social media posts before calling. The FTC documented this shift in a May 2023 consumer alert, warning that AI voice cloning had made grandparent scams “virtually undetectable by ear.” A grandmother who has heard her grandson’s real voice hundreds of times can still be fooled, because the voice she hears is her grandson’s—just generated by a machine.

CEO fraud and business wire transfers

Businesses are equally vulnerable. In 2020, The Wall Street Journal reported that a UK energy company wired €220,000 after a phone call in which the caller’s voice sounded exactly like the CEO of the company’s German parent. The CEO’s voice had been cloned with AI. The money was transferred in one hour. By 2024, Cybersecurity Ventures estimated that similar CEO voice-fraud attacks had cost companies more than $25 million globally. These attacks typically combine a cloned voice with a spoofed phone number, giving the fraudulent call every outward sign of legitimacy.

Political manipulation: the Biden robocall

The January 2024 New Hampshire presidential primary robocall is the most documented case of AI voice cloning used for political manipulation. Ahead of the primary, Democratic voters in New Hampshire received an automated call that used an AI clone of President Biden’s voice. The cloned voice told Democrats not to vote in the primary—to “save your vote” for November instead.

The FCC traced the call to a political consultant and issued a $6 million fine—the first major regulatory penalty specifically targeting AI voice fraud. The FCC simultaneously issued a ruling clarifying that AI-generated voices in robocalls require prior written consent under the Telephone Consumer Protection Act.

The case mattered beyond the fine. It showed that audio deepfakes had crossed from financial fraud into election interference. For the full legal picture, see our deepfake laws guide.

Can you spot an audio deepfake?

Trained listeners can catch early-generation voice clones, but state-of-the-art 2024 models have eliminated most of the classic tells—and consumer detection tools now miss roughly four in ten fakes.

What older fakes sounded like

First-generation voice clones had characteristic artifacts that careful listeners could catch:

Unnatural pacing: pauses fell in grammatically logical but conversationally wrong places.
Consistent amplitude: real speakers breathe, shift the phone, or trail off; cloned voices had unnaturally flat volume across the full clip.
Perfect pronunciation: real voices slip, swallow syllables, or vary slightly on repeated words. Clones were too clean.
No breath sounds: inhales and exhales were absent or mechanically regular.
Odd prosody: emotional inflection felt slightly off—a speaker conveying urgency but sounding calm.

Why those tells no longer apply

Tools like ElevenLabs’ newer models now add synthetic breath sounds, vary amplitude, and model natural slips in pronunciation. A 2024 assessment found that consumer-grade audio deepfake detection tools had dropped to roughly 60% accuracy against state-of-the-art voice clones. That means a coin flip would catch nearly as many fakes as a dedicated detector. Your ear alone is not a reliable filter.

For comparison, visual deepfakes still leave physical artifacts in video that trained eyes (and dedicated tools) can catch with higher reliability. See our guide on how to spot a deepfake for the video-specific signals.

Detection tools that exist

Several tools claim to detect AI-generated audio, but none are reliable enough to use as your sole defense. Here is an honest assessment of what exists in 2024:

AI or Not (aiornot.com): free tier available, covers audio and images. Works well on lower-quality clones; less reliable against premium voice models.
Reality Defender: enterprise-grade platform used by news organizations and financial institutions. More accurate but requires a paid subscription and is not designed for individual consumers.
ElevenLabs Speech Classifier: ElevenLabs built their own detector, but it is trained only to identify audio generated by ElevenLabs models. It will miss clones created with other tools.
Microsoft VALL-E watermarking: Microsoft published a spec for embedding inaudible watermarks in AI-generated audio in 2023. This approach works only if the voice cloning tool has voluntarily adopted the watermarking standard—which most have not.

The honest summary: detection tools are a useful second opinion, not a reliable gatekeeper. A bad actor who uses a tool that the detector was not trained on will slip through. Behavioral verification—covered in the next section—is more reliable.

How to protect yourself and your family

The single most effective defense against AI voice-clone scams is a technique that requires no technology at all: a family safe word.

The family safe word (your most powerful tool)

Choose a word or short phrase that every member of your household knows but that you have never spoken publicly—not in a video, podcast, social media post, or recorded call. Examples: a made-up word, the name of a childhood pet that never appeared online, or a nonsense phrase your family coined.

The rule is simple: if someone calls claiming to be a family member in an emergency, they must say the safe word before you act on anything they ask. An AI has no access to a secret that was never spoken aloud publicly. The clone cannot produce the word. This single step defeats the grandparent scam completely.

Share the safe word with close family members in person or by encrypted message—not by phone call or email, which can be intercepted or impersonated. Review it once a year so everyone remembers it.

Call back on a number you already know

If you receive any call creating urgency—a family member in trouble, a bank fraud alert, a company CEO asking for a fast wire transfer—hang up and call back on a number you already have in your contacts or that you look up yourself. Do not trust a phone number the caller gives you. Scammers spoof caller ID and provide callback numbers that route to accomplices.

Slow down on social media

Every audio sample you post publicly is potential training data for a clone of your voice or your children’s voices. This does not mean you must go silent, but it is worth understanding the exposure. Voice samples in videos—even five seconds of someone saying hello—are usable. Parents worried about this risk can find more guidance on our AI safety for parents page.

Limit access to financial accounts

No bank, government agency, or legitimate business will ever ask you to wire money or buy gift cards to resolve an emergency on a first call. That request is always a scam, regardless of how convincing the voice sounds. Wire transfers and gift card payments are nearly impossible to reverse.

Stay current on new cases

The tactics evolve quickly. Voice cloning tools that cost thousands of dollars in 2020 are now free. Checking in on developments once a month is reasonable for any adult who handles finances or cares for older relatives. Our daily AI briefing covers new AI scam tactics as they emerge.

Frequently asked questions

▸ What is an audio deepfake?

An audio deepfake is an AI-generated voice recording that sounds like a specific real person. The AI studies a sample of that person's speech and synthesizes new audio in their voice, accent, and tone. Newer tools require as little as three seconds of source audio and are available to anyone online. The result can be nearly impossible to distinguish from the real person speaking.

▸ How are audio deepfakes used in scams?

The most common scam clones a family member's voice—often a grandchild's—from social media, then calls an older relative claiming to be in an emergency and needing money immediately. Businesses face CEO fraud, where an executive's cloned voice instructs staff to wire funds. The FTC reported voice scams caused over $1.1 billion in consumer losses in 2023, the highest category of fraud loss that year.

▸ Can you detect an audio deepfake by listening?

Increasingly, no. Early voice clones had tells like flat volume, missing breath sounds, and unnaturally perfect pronunciation. State-of-the-art 2024 models have largely eliminated those artifacts. Consumer detection tools now achieve only around 60% accuracy against the best voice clones, which means your ear alone—or a detection app—is not a reliable defense. Behavioral verification, like a family safe word, is more dependable.

▸ What is a family safe word and how does it stop voice clone scams?

A family safe word is a word or phrase known only to your household that callers must say before you act on any emergency request. Because the word was never spoken publicly, an AI trained on your family member's public audio has no way to produce it. A caller who cannot say the word is not your family member, regardless of how convincing the voice sounds. Share the word in person or by encrypted message—never by phone or standard email.

▸ What tools exist to detect AI-generated audio?

AI or Not offers a free tier that covers audio and images. Reality Defender is an enterprise platform used by news organizations. ElevenLabs publishes its own classifier, but it only detects audio made with ElevenLabs tools. Microsoft proposed a watermarking standard in 2023, but adoption by voice cloning services is not mandatory. None of these tools are reliable enough to serve as your sole defense against a determined bad actor.

▸ Is it illegal to make or use an audio deepfake?

Using a voice clone to commit fraud or impersonate someone for financial gain is illegal under existing wire fraud and identity theft statutes. The FCC fined a political consultant $6 million for the AI-cloned Biden robocall in January 2024, the first major regulatory penalty targeting AI voice fraud specifically. Several states are passing new laws targeting voice cloning in elections and non-consensual use. See our deepfake laws guide for current state-by-state rules.