OpenAI introduced Sora, its premier text-to-video generator, on Thursday with beautiful, shockingly realistic videos showcasing the AI model’s capabilities. Sora is now available to a small number of researchers and creatives who will test the model before a broader public release, which could spell disaster for the film industry and our collective deepfake problem.
“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” said OpenAI in a blog post. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”
OpenAI didn’t say when Sora will be released to the public.
Sora is OpenAI’s first venture into AI video generation, adding to the company’s AI-powered text and image generators, ChatGPT and Dall-E. It’s unique because it’s less of a creative tool, and more of a “data-driven physics engine,” as pointed out by Senior Nvidia Researcher Dr. Jim Fan. Sora is not just generating an image, but it’s determining the physics of an object in its environment and renders a video based on these calculations.
To generate videos with Sora, users can simply type in a few sentences as a prompt, much like AI-image generators. You can choose between a photorealistic or an animated style, producing shocking results in just a few minutes.
Sora is a diffusion model, meaning it generates video by starting with a blurry, static-filled video and slowly smoothes it into the polished versions you see below. Midjourney and Stable Diffusion’s image and video generators are also diffusion models.
However, I must note that OpenAI’s Sora is much better. The videos Sora produces are longer, more dynamic, and flow together better than competitors. Sora feels like it creates real videos, whereas competitor models feel like a stop motion of AI images. OpenAI has once again erupted yet another field of AI with a video generator that puts the competition to shame.
The videos produced by Sora are undeniably incredible. These videos would have taken hours to produce by a real film crew or animators. Sora will likely be disruptive to the film industry in the same way that ChatGPT and AI-image generators have shocked the editorial and design world. It’s a technology that is both remarkable and yet frightening in terms of job security for video creators.
OpenAI says there are a few tweaks to be worked out, including not understanding cause and effect. Sora may generate a video of a person taking a bite out of a cookie, but after, the cookie might not have a bite mark. OpenAI also says the model lacks spatial awareness. It may confuse left and right, and not understand how a person or object interacts with a scene.
Safety is also a primary concern, especially given how AI technology has been abused to create deepfakes in recent months. OpenAI says it will build tools to help detect misleading content, as well as apply existing technologies that reject harmful text prompts. However, given the ways people have circumnavigated protections of current AI models, it’s questionable how successful these efforts will be.
Sora is as impressive as it is terrifying, and it’s clear how this powerful AI video generator could disrupt the film industry and create harmful outputs. Imagine if the Taylor Swift deepfakes were videos. Or what if the Joe Biden deepfake phone call to New Hampshire voters was a photorealistic message from the Oval Office? Sora is not publicly available yet, but the implications of technology this powerful precede its launch.