Stability AI, the company behind the popular Stable Diffusion text-to-image generative AI technology is now previewing a new image generation model called Stable Cascade. 

The new model is intended to help prove new approaches to image generation that are more flexible and efficient than the current generation of Stable Diffusion models. Stability AI has been steadily iterating on its core Stable Diffusion model since 2022. The SDXL 1.0 release in July 2023 marked a new flagship release, which was further accelerated with the SDXL Turbo update in November 2023.

Stable Cascade uses somewhat of a different architecture than SDXL to generate images that Stability AI researchers hope will be more efficient. The new approach builds on the Würstchen architecture, which uses a series of innovative techniques to improve performance and accuracy.

“A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process,” the Würstchen research abstract states. “This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results.”

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

 


Request an invite

Unlike Stable Diffusion which uses a single large model, Stable Cascade utilizes a pipeline of three distinct smaller models referred to as Stages A, B and C. This modular architecture provides major advantages in training efficiency and customization.

The first stage, Stage C, transforms text prompts into compact 24×24 pixel latents. Stages A and B then decode these latents into full high-resolution images. By separating the text-to-image generation from the image decoding, the initial text-conditional model can be trained and fine-tuned much more efficiently. According to Stability AI, fine-tuning Stage C alone provides a 16x cost reduction compared to fine-tuning an equivalently sized single Stable Diffusion model. 

There is also the potential for Direct Preference Optimization (DPO) to further improve image quality. In a 2023 interview with VentureBeat, Stability AI founder and CEO Emad Mostaque explained that DPO is an alternative approach to reinforcement learning used in models to tune them to human preferences. 

“The #stablecascade output will be even better with DPO (note three stage..) & of course can turbofy it, quantise it etc,” Mostaque wrote in an X (formerly Twitter) message. “This is a research preview benchmark/vanilla model but produces great images & solid text out of the box that you can improve with ComfyUI flows.”

In Stability AI’s evaluations, Stable Cascade outperformed other leading AI art models including SDXL in terms of both image quality and prompt alignment.

Remarkably, despite having 1.4 billion more parameters than SDXL, Stable Cascade has faster inference times. According to Stability AI, the compressed latent space allows the model to generate complex images more efficiently through its multi-stage approach.

Also of note is the Stable Cascade’s typography capabilities to properly generate text inside of images, which is a capability that SDXL does not excel at. Other text-to-image gen AI technologies such as Ideogram and OpenAI’s DALL-E 3 have increasingly made strides in recent months to also improve text generation, with mixed results. In limited tests conducted by VentureBeat, Stable Cascade did more consistently generate the proper text in an image, from a prompt request, though it’s still far from perfect.

Credit: VentureBeat using Stable Cascade

Stable Cascade also supports other capabilities including image variations. 

Stable Cascade can generate new variations of a given image while maintaining aspects like style and composition. The model can also perform image-to-image translations by adding noise to an input image and generating a new image from it. Support for ControlNets allows for advanced techniques like in-painting and super-resolution. Stable Cascade is currently in research preview and available for non-commercial usage with a code available on GitHub.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Source link