Home Technology Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen...

Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI

22.02.2024

Stability AI is out today with an early preview of its Stable Diffusion 3.0 next-generation flagship text-to-image generative AI model.

Stability AI has been steadily iterating and releasing multiple image models over the past year, each showing increasing levels of sophistication and quality. The SDXL release in July dramatically improved the Stable Diffusion base model and now the company is looking to go significantly further.

The new Stable Diffusion 3.0 model aims to provide improved image quality and better performance in generating images from multi-subject prompts. It will also provide significantly better typography than prior Stable Diffusion models enabling more accurate and consistent spelling inside of generated images. Typography has been an area of weakness for Stable Diffusion in the past and one that rivals including DALL-E 3, Ideogram and Midjourney have also been working on with recent releases. Stability AI is building out Stable Diffusion 3.0 in multiple model sizes ranging from 800M to 8B parameters.

Stable Diffusion 3.0 isn’t just a new version of a model that Stability AI has already released, it’s actually based on a new architecture.

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

Request an invite

“Stable Diffusion 3 is a diffusion transformer, a new type of architecture similar to the one used in the recent OpenAI Sora model,” Emad Mostaque, CEO of Stability AI told VentureBeat. “It is the real successor to the original Stable Diffusion.”

Diffusion transformers and flow matching will enable a new era of image generation

Stability AI has been experimenting with multiple types of approaches for generating images.

Earlier this month the company released a preview of Stable Cascade that uses the Würstchen architecture to improve performance and accuracy. Stable Diffusion 3.0 is taking a different approach by using diffusion transformers.

“Stable Diffusion did not have a transformer before,” Mostaque said.

Transformers are at the foundation of much of the gen AI revolution and are widely used as the basis of text generation models. Image generation has largely been in the realm of diffusion models. The research paper that details Diffusion Transformers (DiTs), explains that it is a new architecture for diffusion models that replaces the commonly used U-Net backbone with a transformer operating on latent image patches. The DiTs approach can use compute more efficiently and can outperform other forms of diffusion image generation.

The other big innovation that Stable Diffusion benefits from is flow matching. The research paper on flow matching explains that it is a new method for training Continuous Normalizing Flows (CNFs) to model complex data distributions. According to the researchers, using Conditional Flow Matching (CFM) with optimal transport paths leads to faster training, more efficient sampling, and better performance compared to diffusion paths.

Credit: Stability AI (generated with Stable Diffusion 3.0)

Stable Diffusion has learned how to spell

The improved typography in Stable Diffusion 3.0 is the result of several improvements that Stability AI has built into the new model.

“This is thanks to both the transformer architecture and additional text encoders,” Mostaque said. “Full sentences are now possible as is coherent style.”

While Stable Diffusion 3.0 is initially being demonstrated as a text-to-image gen AI technology, it will be the basis for much more. Stability AI has also been building out 3D image generation as well as video generation capabilities in recent months.

“We make open models that can be used anywhere and adapted to any need,” Mostaque said. “This is a series of models across sizes and will underpin the development of our next generation visual models, including video, 3D, and more.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link

Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI

VB Event

Diffusion transformers and flow matching will enable a new era of image generation

Stable Diffusion has learned how to spell

Steve Jobs’ widow grabs $94 million California lot

Decoding Donald Trump’s favorite 2024 campaign trail catchphrases : NPR

Why NOW is the Time to Master YouTube Live (A 2024...

Seth Godin and Dr. Sue Johnson (#747)

Roots introduces a screen time app for tracking ‘digital dopamine’