Too Dangerous? OpenAI Won’t Release Its Voice Cloning Tool for Fear of Misuse

image provided by pixabay

This post is also available in: עברית (Hebrew)

OpenAI’s tool can generate a convincing clone of a person’s voice with just 15 seconds of recorded audio, and was deemed too risky for general release as part of an effort to minimize the threat of misinformation in a year of many elections worldwide.

Voice Engine was developed in 2022 and never revealed publicly, partially due to the “cautious and informed” approach that OpenAI seems to be taking. The company said in a blog post: “We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

The Guardian reports the company shared examples of real-world uses of its technology from several partners who were given access to Voice Engine for building their own products. These uses include generating scripted voiceovers for an education technology app, translating recorded content while preserving the essence of the original language, and even restoring the voice of a woman who lost it due to a brain tumor.

OpenAI explained that they are currently choosing to preview but not widely release the technology in order to “bolster societal resilience against the challenges brought by ever more convincing generative models”. The company also called for “policies to protect the use of individuals’ voices in AI” and “educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content”.

However, while Voice Engine’s output is watermarked, there are competing tools already available to the public. For example, companies like ElevenLabs can generate a complete voice clone with just a “few minutes of audio.”

To try and deal with the potential harm this technology might cause, the company introduced a “no-go voices” safeguard that is meant to detect and prevent the creation of voice clones “that mimic political candidates actively involved in presidential or prime ministerial elections, starting with those in the US and the UK”.