Audio deepfakes are emerging as a powerful new tool in information warfare during a year of big elections around the world, as artificial intelligence-powered voice-cloning tools proliferate online.
On Monday, the office of New Hampshire’s attorney-general said it was investigating possible voter suppression, after receiving complaints that an “artificially generated” voice in the likeness of US President Joe Biden was robocalling voters encouraging them not to vote in the state’s presidential primary.
Researchers have also warned that the use of realistic but faked voice clips that imitate politicians and leaders are likely to spread, following instances in 2023 of allegedly synthetic audio being created to influence politics and elections in the UK, India, Nigeria, Sudan, Ethiopia and Slovakia.
Audio deepfakes are becoming an increasingly popular form of disinformation, according to experts, because of the advent of cheap and effective AI tools from start-ups such as ElevenLabs, Resemble AI, Respeecher and Replica Studios. Meanwhile, Microsoft’s research arm announced the development last year of a new company AI model, VALL-E, that can clone a voice from just three seconds of recordings.
“When it comes to visual manipulation, everyone’s used to Photoshop or at least knows it exists,” Henry Ajder, an expert on AI and deepfakes and adviser to Adobe, Meta and EY. “There’s much less awareness about how audio material can be manipulated, so that, to me, really primes us to be vulnerable.”
In September, NewsGuard, which rates the quality and trustworthiness of news sites, uncovered a network of TikTok accounts posing as legitimate news outlets, featuring AI-generated voice-overs peddling conspiracy theories and political misinformation. This included a simulated voice of former US president Barack Obama defending himself against baseless claims linking him to the death of his personal chef.
The fake voice-overs appeared to have been generated by a tool made available by the Andreessen Horowitz-backed ElevenLabs, while the clips racked up hundreds of millions of views, NewsGuard said.
“Over 99 per cent of users on our platform are creating interesting, innovative, useful content, but we recognise that there are instances of misuse, and we’ve been continually developing and releasing safeguards to curb them,” ElevenLabs said at the time of the report.
ElevenLabs, founded two years ago by former Google and Palantir staffers Piotr Dabkowski and Mati Staniszewski, offers free rudimentary AI audio generation tools at the click of a mouse. Subscriptions range from $1 a month to $330 a month and more for those seeking more sophisticated offerings.
Disinformation perpetrators have been emboldened by AI tools pioneered by ElevenLabs, which has shifted the quality of synthetic audio from being disjointed and robotic, to more natural with the right inflection, intonation and emotions, according to Ajder.
“Fundamentally, [ElevenLabs] changed the game both in terms of the realism that can be achieved, especially with a small amount of data,” he said.
The market for text-to-speech tools has exploded over the past year. Some, such as Voice AI, offer free apps and market its technology for use as dubbing for pranks. Others, such as Replica Studios and Respeecher, charge nominal fees for creators, filmmakers or game developers.
It is often unclear which companies are being used to create politically motivated deepfakes as most detection tools cannot identify the original source. But the increasing prevalence of such AI-powered products is leading to concern over potential abuses in an unregulated space.
Last year, US intelligence agencies warned in a report that “there has been a massive increase in personalised AI scams given the release of sophisticated and highly trained AI voice-cloning models”.
Beyond financially motivated scams, political experts are now sounding the alarm over viral deepfake audio clips as well as the use of deepfakes for robocalling or campaigns. “You can very inexpensively build a strong, wide campaign of misinformation by phone-targeting people,” said AJ Nash, vice-president and distinguished fellow of intelligence at cyber security group ZeroFox.
Some of these companies have proactively sought other ways to counter disinformation. Microsoft issued an ethical statement, calling for users to report any abuses of its AI audio tool, stating the speaker should approve the use of their voice with the tool. ElevenLabs has built its own detection tools to identify audio recordings that are made by its systems. Others, such as Resemble, are exploring stamping AI-generated content with inaudible watermarks.
During 2023 elections in Nigeria, an AI-manipulated clip spread on social media “purportedly implicating an opposition presidential candidate in plans to rig balloting”, according to human rights group Freedom House.
In Slovakia, a fake audio of the opposition candidate Michal Šimečka seemingly plotting to rig the election went viral just days before the country’s presidential vote in September.
Sowing further confusion, groups and individuals in India and Ethiopia have denounced audio recordings as fake, only for other independent researchers and fact-checkers to claim they were authentic.
Experts warned an associated problem is that AI-created audio is often harder to detect than video. “You just have a lot less contextual clues that you could try to work off,” says Katie Harbath, global affairs officer at Duco Experts and a former Meta public policy director.
There are often tell-tale visual indicators that a video is inauthentic, such as glitches in quality, strange shadows, blurring or unnatural movements.
“The advantages with audio [for bad actors] are that you can be less precise,” said Nash. “For flaws, you can cover them up with background noise, muffled music.” A deepfake of UK opposition leader Sir Keir Starmer allegedly berating a staffer, for example, sounded as if it was recorded in a busy restaurant.
A market for technology-assisted detection is emerging to counter the problem. Cybersecurity group McAfee this month announced Project Mockingbird, a tool that looks for anomalies in sound patterns, frequencies and amplitude, before giving users a probability of whether audio is real or fake. McAfee’s chief technology officer Steve Grobman said its detection tool has about 90 per cent effectiveness.
Nicolas Müller, machine-learning research scientist at Fraunhofer AISEC, noted that deliberately adding music or degrading the quality of the audio also interferes with the accuracy of the detection tools.
Online platforms are scrambling to contain the problem. Meta has faced criticism because it explicitly bans manipulated video designed to mislead, but the same rules do not appear to apply to audio. Meta said audio deepfakes were eligible to be fact-checked and would be labelled and downranked in users’ feeds when found. TikTok has also been investing in labelling and detection capabilities.
“The New Hampshire deepfake is a reminder of the many ways that deepfakes can sow confusion and perpetuate fraud,” said Robert Weissman, president of non-profit consumer advocacy group Public Citizen. “The political deepfake moment is here. Policymakers must rush to put in place protections or we’re facing electoral chaos.”