Your voice is yours alone – as unique to you as your fingerprints, eyeballs and DNA.
Unfortunately, that doesn’t mean it can’t be spoofed. And that reality could undermine one of the promised security benefits of multi-factor authentication, which requires “something you are,“ along with something you have or you know. In theory, even if attackers can steal passwords, they can’t turn into you.
But given the march of technology, that is no longer a sure thing. Fingerprints are no longer an entirely hack-proof method of authentication – they can be spoofed.
That will soon be true of your voice as well.
The risk goes well beyond recent warnings from the Federal Communications Commission (FCC) and Better Business Bureau (BBB) about spam callers trying to get a victim to say the word “yes,” which they record and then use to authorize fraudulent credit card or utility charges, or to “prove” that the victim owes them money for services never ordered.
This technology is aimed at “cloning” an individual’s voice accurately enough to make him or her say anything you want. The potential risks are obvious: If your phone requires your voice to unlock it, an attacker with some audio of your voice could do it.
It is not perfect yet. But it is already remarkably close. A demonstration last fall at Adobe Max 2016 of the company’s VoCo, nicknamed “Photoshop for voice,” turned a recording of a man saying, “I kissed my dogs and my wife,” into “I kissed Jordan three times.” The audience went crazy.
The pitch for the product: “With a 20-minute voice sample, VoCo can make anyone say anything.”
More recently, researchers from the University of Montreal’s Institute for Learning Algorithms laboratory, announced that they are seeking investors for their voice imitation software, Lyrebird, which they say will be able to mimic any voice from as little as a minute of audio recording.
According to Scientific American, the Lyrebird technology relies on, “artificial neural networks – which use algorithms designed to help them function like a human brain – that rely on deep-learning techniques to transform bits of sound into speech.”
The researchers say the system can then adapt to any voice based on only a one-minute sample of someone’s speech.
The exciting – or ominous – implication is that, as Scientific American put it, after learning the, “pronunciation of characters, phonemes and words in any voice … it can extrapolate to generate completely new sentences and even add different intonations and emotions.”
Once perfected, there are numerous possibilities for mischief – well beyond simply creating comedic videos spoofing the voices of your favorite celebrities. Besides undermining voice-based verification, leading to identity theft or other fraud – Santander Bank was running ads just this past week on voice verification – it could eliminate the use of voice or video recordings as evidence in court.
Sign up for CIO Asia eNewsletters.