Stability AI, the company known for its growing array of open source AI models for content creation and coding, today announced an upgrade for its image-to-video latent diffusion model, Stable Video Diffusion (SVD).
Dubbed SVD 1.1, the updated model is a fine-tuned version of SVD 1.0, optimized to generate short AI videos with better motion and more consistency.
In a post announcing the upgrade, Tom Mason, the CTO of Stability AI, confirmed that the new model is available for public use and can be downloaded via Hugging Face.
He also noted that the model will be provided as part of Stability’s subscription memberships, which have different tiers for individual and enterprise users, including free, $20 per month, and up. For users looking to deploy the new SVD 1.1 for commercial purposes, they will need to pursue a membership.
VB Event
The AI Impact Tour – NYC
We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.
However, it remains open and free for research uses.
What to expect from Stability AI’s SVD 1.1?
Back in November 2023, Stability launched two models for AI videos: SVD and SVD-XT. The former was the base model that took in a still image as a conditioning frame and generated a four-second video from it with up to 14 frames. Meanwhile, the latter was the fine-tuned version that worked in the same way but produced up to 25 frames.
Now, finetuning on SVD-XT, Stability has debuted SVD 1.1. This model, the company says, also generates four-second videos with 25 frames but at a resolution of 1024×576 given a context frame of the same size.
More importantly, this upgrade is expected to deliver more consistent video outputs as compared to the original model.
For example, on many occasions, SVD and SVD-XT would miss out on delivering photorealism, generate videos without motion or with very slow camera pans and fail to generate faces and people as users may expect. All of this is expected to go away with SVD 1.1, which promises to achieve better motion in the outputs.
“Fine-tuning (for SVD 1.1) was performed with fixed conditioning at 6FPS and motion bucket Id 127 to improve the consistency of outputs without the need to adjust hyperparameters. These conditions are still adjustable and have not been removed. Performance outside of the fixed conditioning settings may vary compared to SVD 1.0,” the company notes on the Hugging Face page of the new mode.
Actual AI videos remain to be seen
While Stability claims performance improvement with SVD 1.1, it remains to be seen how exactly it works in practice. The Hugging Face page for the model notes that it is for research purposes and also cautions that some of the original issues might still crop up.
Notably, in addition to Hugging Face, Stable Video Diffusion models can also be used via API available on the Stability AI developer platform. This gives developers an easy way to seamlessly integrate advanced video generation into their products.
“…we have released the Stable Video Diffusion API, which generates 4 seconds of video at 24fps in MP4 format including 25 generated frames and remaining interpolated frames. We support motion strength control and multiple layouts and resolutions, including 1024×576, 768×768, and 576×1024,” Mason noted in his post.
Last year, Stability AI pushed the bar on generative AI with frequent model releases. 2024 also appears to be going the same way. The company was founded in 2019 and has raised significant funding, including a $101 million round announced in 2022. However, it is not the only one operating in this space. Competitive offerings from Runway and Pika have also gained traction, especially with their customer-facing web platforms that not only generate videos but also provide options to customize and upscale them with ease.
Recently, competitor Runway debuted Multi Motion Brush on its platform, allowing users to add motion to specific parts of their AI videos. Another AI video generation company, Pika, allows users to modify specific regions in their videos, like swapping a cow face with that of a duck. However, both these platforms still do not offer their models via APIs, keeping developers from integrating them into their respective applications.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.