Postponing the rollout of Voice Engine technology minimizes the potential for misinformation in a crucial global election year
OpenAI has decided not to release a new tool capable of generating a convincing clone of anyone’s voice using just 15 seconds of recorded audio, citing concerns about the potential for misinformation during a critical global election year.
Voice Engine, developed in 2022 and initially used for the text-to-speech feature in ChatGPT, has not been publicly disclosed in detail due to OpenAI’s cautious approach to its release.
“We aim to initiate discussions on the responsible use of synthetic voices and how society can adapt to these new capabilities,” OpenAI stated in an unsigned blog post. “Based on these discussions and the outcomes of small-scale trials, we will make a more informed decision on whether and how to deploy this technology on a larger scale.”
In its post, the company provided examples of how various partners have used the technology in real-world applications. Education technology company Age of Learning, for instance, utilizes it to create scripted voiceovers. Similarly, the “AI visual storytelling” app HeyGen allows users to translate recorded content fluently while retaining the accent and voice of the original speaker. For instance, inputting an audio sample from a French speaker to generate English results in speech with a French accent.
Researchers at the Norman Prince Neurosciences Institute in Rhode Island notably used a low-quality 15-second clip of a young woman presenting at a school project to “restore the voice” she had lost due to a vascular brain tumor.
“We have decided to preview this technology but not release it widely at this time,” OpenAI stated, aiming “to enhance societal resilience against the challenges posed by increasingly convincing generative models.” In the near term, the company advised: “We recommend measures such as phasing out voice-based authentication as a security measure for accessing bank accounts and other sensitive information.”
OpenAI also advocated for the development of “policies to safeguard the use of individuals’ voices in AI” and for “educating the public about understanding the capabilities and limitations of AI technologies, including the potential for deceptive AI content.”
According to OpenAI, Voice Engine creations are watermarked, enabling the organization to track the source of any generated audio. Presently, “our agreements with these partners mandate explicit and informed consent from the original speaker, and we do not permit developers to create methods for individual users to generate their own voices.”
Although OpenAI’s tool is notable for its technical simplicity and the small amount of original audio needed to create a convincing clone, competing tools are already accessible to the public.
For example, companies like ElevenLabs can produce a full voice clone with just “a few minutes of audio.” In an effort to mitigate potential harm, the company has implemented a “no-go voices” feature, which aims to identify and block the creation of voice clones “that mimic political candidates currently participating in presidential or prime ministerial elections, beginning with those in the US and the UK.”