The ChatGPT maker’s model simulates a minute-long physical world based on user instructions for subject and style
On Thursday, OpenAI unveiled a tool that can create videos from text prompts. Named Sora, after the Japanese word for “sky,” this new model can generate realistic footage up to one minute long, following user instructions for subject matter and style. The model is also capable, according to a company blog post, of producing a video from a still image or extending existing footage with new material.
The blog post states, “We’re training AI to comprehend and replicate the dynamic physical world, aiming to develop models that assist people in solving problems involving real-world interactions.”
One of the initial examples from the company is a video based on the prompt: “A movie trailer showcasing the adventures of a 30-year-old astronaut wearing a red wool-knitted motorcycle helmet, set against a blue sky and salt desert, filmed in cinematic style on 35mm film, emphasizing vivid colors.”
The company disclosed that it had granted access to Sora to a select group of researchers and video creators. These experts are tasked with “red teaming” the product, which involves testing it for susceptibility to violating OpenAI’s terms of service. These terms prohibit content depicting “extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others,” as outlined in the company’s blog post. While access is limited to researchers, visual artists, and filmmakers, CEO Sam Altman did respond to user prompts on Twitter after the announcement with video clips he claimed were created by Sora. These videos feature a watermark indicating they were generated by AI.
The company first introduced the still image generator DALL-E in 2021 and the generative AI chatbot ChatGPT in November 2022, which rapidly gained 100 million users. While other AI companies have also introduced video generation tools, these models typically only produce a few seconds of footage that often deviates significantly from their prompts. Google and Meta have indicated that they are working on developing generative video tools but have not yet made them available to the public. Recently, the company announced an experiment involving ChatGPT that aims to enhance its memory capabilities, enabling it to recall more of its users’ conversations.
OpenAI did not reveal the extent of the footage used to train Sora or the sources of the training videos. They only mentioned to the New York Times that the corpus included videos that were publicly accessible and licensed from copyright holders. The company has faced several lawsuits alleging copyright infringement in the training of its generative AI tools, which process vast amounts of data scraped from the internet to replicate the images or text found in those datasets.