This post is also available in:
עברית (Hebrew)
A new academic study out of Queen Mary University of London has found that AI-generated voices, particularly those created using voice cloning tools, are now virtually indistinguishable from actual human speech. Published in the peer-reviewed journal PLOS One, the research suggests that synthetic voice technology has reached a critical point of realism, raising important questions for security, ethics, and digital communications.
The researchers tested how people perceived a range of voice samples: natural human voices, cloned voices based on real individuals, and generic AI-generated voices not linked to a specific person. Participants were asked to assess how realistic the voices sounded, as well as how dominant or trustworthy they appeared, according to TechXplore.
Results showed that cloned voices—produced using only a few minutes of original speech data—were frequently rated as equally realistic as the human voices. Interestingly, both cloned and generic synthetic voices were judged to sound more dominant than their human counterparts, and some were rated as more trustworthy as well.
Despite growing concern around the idea of “hyperrealism” in AI-generated media (where synthetic content is perceived as more authentic than real content), the study did not find that effect in the audio samples. Still, the inability of listeners to consistently tell real and cloned voices apart signals a major development in the field.
One key takeaway from the research is how easily these voice clones can be created. Using off-the-shelf software, researchers were able to generate convincing replicas of individual voices with minimal time, expertise, or cost. This rapid and low-barrier access underscores how far the technology has advanced—and how quickly it is becoming mainstream.
While this progress opens the door for useful applications in accessibility, education, and communication, the study also highlights serious risks. From fraud to impersonation and misinformation, the implications for voice-based security systems and public trust are significant.
The findings reinforce the need for better tools to detect synthetic speech, as well as updated frameworks for managing its use in both public and private sectors.