Audio Deepfakes: What They Are and How to Protect Yourself
AI can now clone anyone's voice from a short recording. Here is what audio deepfakes are, how scammers use them, and the one low-tech trick that stops a voice-clone attack cold.
- What is an audio deepfake?
- How voice cloning works
- Real-world harm: scams and fraud
- Political manipulation: the Biden robocall
- Can you spot an audio deepfake?
- Detection tools that exist
- How to protect yourself and your family
- Frequently asked questions
What is an audio deepfake?
An audio deepfake is a fake voice recording created by artificial intelligence that sounds like a specific real person. The AI studies a sample of someone’s real speech—sometimes as little as three seconds—and generates new audio that mimics their voice, accent, pacing, and tone. The result can be nearly indistinguishable from the real person speaking.
Audio deepfakes are also called voice deepfakes, voice clones, or AI voice clones. They are a subset of the broader deepfake category, which covers AI-manipulated video and images as well. For background on the wider topic, see our guide on what deepfakes are.
The technology is not theoretical. Commercial services like ElevenLabs, Resemble AI, and others make voice cloning available to anyone with an internet connection. ElevenLabs’ 2023 transparency report confirmed the company processes millions of voice generations per month. Their terms prohibit impersonation, but ElevenLabs acknowledged it is technically impossible to prevent 100% of misuse.
How voice cloning works
Modern voice cloning works by training a neural network on audio samples of the target person, then using that model to synthesize new speech in their voice. The process has three steps that have all become dramatically faster and cheaper since 2022.
First, the AI ingests source audio—a clip from a YouTube video, a podcast appearance, a voicemail, or a social media story. Even a few seconds is enough for newer models. Second, the model extracts a voice fingerprint: the unique pattern of pitch, resonance, accent, and cadence that makes a voice recognizable. Third, you type any text and the model reads it back in the cloned voice.
The key non-obvious fact here: the source audio does not need your permission or knowledge. A scammer who has heard you speak publicly—in a video, on a podcast, at an event that was recorded—already has everything they need. Parents who post videos of their children on social media are inadvertently creating a voice library that bad actors can exploit.
Microsoft’s VALL-E model, published in a 2023 research paper, demonstrated that three seconds of audio was sufficient to clone a voice with high accuracy. Microsoft published a watermarking specification alongside the research to mitigate misuse, but watermarking is not yet standard industry practice.
Real-world harm: scams and fraud
Voice scams, including AI voice clones, generated over $1.1 billion in reported consumer losses in 2023—the highest category of fraud loss tracked by the FTC’s 2024 Consumer Sentinel report. That figure covers only reported cases; researchers estimate actual losses are several times higher because most fraud goes unreported.
The grandparent scam goes high-tech
The grandparent scam has existed for decades: a caller claims to be a grandchild in an emergency, needing bail money or medical funds wired immediately. Traditional versions used voice actors and relied on panic to prevent victims from thinking clearly. AI has made the scam far more convincing.
Now scammers clone the actual grandchild’s voice from social media posts before calling. The FTC documented this shift in a May 2023 consumer alert, warning that AI voice cloning had made grandparent scams “virtually undetectable by ear.” A grandmother who has heard her grandson’s real voice hundreds of times can still be fooled, because the voice she hears is her grandson’s—just generated by a machine.
CEO fraud and business wire transfers
Businesses are equally vulnerable. In 2020, The Wall Street Journal reported that a UK energy company wired €220,000 after a phone call in which the caller’s voice sounded exactly like the CEO of the company’s German parent. The CEO’s voice had been cloned with AI. The money was transferred in one hour. By 2024, Cybersecurity Ventures estimated that similar CEO voice-fraud attacks had cost companies more than $25 million globally. These attacks typically combine a cloned voice with a spoofed phone number, giving the fraudulent call every outward sign of legitimacy.
Political manipulation: the Biden robocall
The January 2024 New Hampshire presidential primary robocall is the most documented case of AI voice cloning used for political manipulation. Ahead of the primary, Democratic voters in New Hampshire received an automated call that used an AI clone of President Biden’s voice. The cloned voice told Democrats not to vote in the primary—to “save your vote” for November instead.
The FCC traced the call to a political consultant and issued a $6 million fine—the first major regulatory penalty specifically targeting AI voice fraud. The FCC simultaneously issued a ruling clarifying that AI-generated voices in robocalls require prior written consent under the Telephone Consumer Protection Act.
The case mattered beyond the fine. It showed that audio deepfakes had crossed from financial fraud into election interference. For the full legal picture, see our deepfake laws guide.
Can you spot an audio deepfake?
Trained listeners can catch early-generation voice clones, but state-of-the-art 2024 models have eliminated most of the classic tells—and consumer detection tools now miss roughly four in ten fakes.
What older fakes sounded like
First-generation voice clones had characteristic artifacts that careful listeners could catch:
- Unnatural pacing: pauses fell in grammatically logical but conversationally wrong places.
- Consistent amplitude: real speakers breathe, shift the phone, or trail off; cloned voices had unnaturally flat volume across the full clip.
- Perfect pronunciation: real voices slip, swallow syllables, or vary slightly on repeated words. Clones were too clean.
- No breath sounds: inhales and exhales were absent or mechanically regular.
- Odd prosody: emotional inflection felt slightly off—a speaker conveying urgency but sounding calm.
Why those tells no longer apply
Tools like ElevenLabs’ newer models now add synthetic breath sounds, vary amplitude, and model natural slips in pronunciation. A 2024 assessment found that consumer-grade audio deepfake detection tools had dropped to roughly 60% accuracy against state-of-the-art voice clones. That means a coin flip would catch nearly as many fakes as a dedicated detector. Your ear alone is not a reliable filter.
For comparison, visual deepfakes still leave physical artifacts in video that trained eyes (and dedicated tools) can catch with higher reliability. See our guide on how to spot a deepfake for the video-specific signals.
Detection tools that exist
Several tools claim to detect AI-generated audio, but none are reliable enough to use as your sole defense. Here is an honest assessment of what exists in 2024:
- AI or Not (aiornot.com): free tier available, covers audio and images. Works well on lower-quality clones; less reliable against premium voice models.
- Reality Defender: enterprise-grade platform used by news organizations and financial institutions. More accurate but requires a paid subscription and is not designed for individual consumers.
- ElevenLabs Speech Classifier: ElevenLabs built their own detector, but it is trained only to identify audio generated by ElevenLabs models. It will miss clones created with other tools.
- Microsoft VALL-E watermarking: Microsoft published a spec for embedding inaudible watermarks in AI-generated audio in 2023. This approach works only if the voice cloning tool has voluntarily adopted the watermarking standard—which most have not.
The honest summary: detection tools are a useful second opinion, not a reliable gatekeeper. A bad actor who uses a tool that the detector was not trained on will slip through. Behavioral verification—covered in the next section—is more reliable.
How to protect yourself and your family
The single most effective defense against AI voice-clone scams is a technique that requires no technology at all: a family safe word.
The family safe word (your most powerful tool)
Choose a word or short phrase that every member of your household knows but that you have never spoken publicly—not in a video, podcast, social media post, or recorded call. Examples: a made-up word, the name of a childhood pet that never appeared online, or a nonsense phrase your family coined.
The rule is simple: if someone calls claiming to be a family member in an emergency, they must say the safe word before you act on anything they ask. An AI has no access to a secret that was never spoken aloud publicly. The clone cannot produce the word. This single step defeats the grandparent scam completely.
Share the safe word with close family members in person or by encrypted message—not by phone call or email, which can be intercepted or impersonated. Review it once a year so everyone remembers it.
Call back on a number you already know
If you receive any call creating urgency—a family member in trouble, a bank fraud alert, a company CEO asking for a fast wire transfer—hang up and call back on a number you already have in your contacts or that you look up yourself. Do not trust a phone number the caller gives you. Scammers spoof caller ID and provide callback numbers that route to accomplices.
Slow down on social media
Every audio sample you post publicly is potential training data for a clone of your voice or your children’s voices. This does not mean you must go silent, but it is worth understanding the exposure. Voice samples in videos—even five seconds of someone saying hello—are usable. Parents worried about this risk can find more guidance on our AI safety for parents page.
Limit access to financial accounts
No bank, government agency, or legitimate business will ever ask you to wire money or buy gift cards to resolve an emergency on a first call. That request is always a scam, regardless of how convincing the voice sounds. Wire transfers and gift card payments are nearly impossible to reverse.
Stay current on new cases
The tactics evolve quickly. Voice cloning tools that cost thousands of dollars in 2020 are now free. Checking in on developments once a month is reasonable for any adult who handles finances or cares for older relatives. Our daily AI briefing covers new AI scam tactics as they emerge.
Frequently asked questions
Frequently asked questions
▸ What is an audio deepfake?
▸ How are audio deepfakes used in scams?
▸ Can you detect an audio deepfake by listening?
▸ What is a family safe word and how does it stop voice clone scams?
▸ What tools exist to detect AI-generated audio?
▸ Is it illegal to make or use an audio deepfake?
Latest related briefings
AI Layoffs Reversed: Companies Rehire Workers
Workers regain jobs as companies rehire after AI layoffs, raising questions about AI's role in employment and job security.
Read analysis CIVIL RIGHTSAI Bias in Hiring: Workday Lawsuit's Impact on Job Seekers
Job seekers face AI bias as Workday lawsuit reveals potential unfairness in hiring. Understand how this affects your employment.
Read analysis JOBS LABORRAISE US Secures $500M for AI Job Training Programs
RAISE US secures $500M for AI job training, aiming to ease job loss fears. Will these programs effectively secure future employment?
Read analysis