TL;DR

  • OpenAI has just announced a new AI model called Sora.
  • The text-to-video generative AI tool can create up to 60 seconds of video content.
  • The company says it is currently working with red teamers to adversarially test the model.

Earlier today, Google announced it is launching version 1.5 of Gemini to developers and enterprise users. Not to be outdone, one of Google’s biggest competitors — OpenAI — also made a big AI announcement today. However, this announcement involves a new text-to-video AI model.

In a blog post, and subsequently on social media, OpenAI unveiled a new text-to-video generative AI model called Sora. The announcement is accompanied by clips created by the software, which range from a Chinese Lunar New Year celebration to an animated monster fawning over a red candle.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

OpenAI states that Sora is currently being made available to red teamers to “assess critical areas for harms or risks.” These red teamers include experts in areas like misinformation, hateful content, and bias. In addition to this testing, Sora will also reportedly be held to the safety measures that exist for DALL·E 3. The company adds that it is working on tools to help detect if a video was generated by Sora.

Although others like Pika and Stability AI have beaten OpenAI to the punch when it comes to AI video generation, there are a few things that make Sora stand out. For one, Sora can create up to 60 seconds of video, while competitors only manage about four seconds. Then there’s the sharpness, resolution, and the accuracy of the surrounding world.

There are over 35 examples you can check out on OpenAI’s website. While the results are impressive, the model is far from perfect. As the company admits:

The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

You can see an example of this in the very first video shown in the blog. The video features a woman walking through Tokyo. If you watch closely, you’ll notice the woman’s legs occasionally switch or stutter, her feet glide across the ground, and her outfit and hair change near the end.

Despite Sora being unavailable to the general public, CEO Sam Altman has been accepting prompts from X (formerly Twitter) users.

Got a tip? Talk to us! Email our staff at news@androidauthority.com. You can stay anonymous or get credit for the info, it’s your choice.

Source link