AI image generation technology is improving so rapidly that in just weeks or months the quality and features that are possible can be totally different. DALL-E 3 brings a leap in technology, but how does it stack up to MidJourney?



What’s Special About DALL-E 3?

We’ve covered the evolution and capabilities of MidJourney in detail before, and so far it has been the go-to image generator for the best artistic output suitable for actual use. However, getting close to what you actually wanted in the generated image in MidJourney can be an extremely hit-and-miss affair. If you want precise control, you’d have to resort to using Stable Diffusion and one of its many mods, such as ControlNet. However, Stable Diffusion is significantly more difficult to use, and both MidJourney and DALL-E 3 are superior in terms of ease of use.

DALL-E promises to stick much more exactly to the wording of your prompt. In other words, if you ask for specific character poses, details in the scene, or arrangements of objects in the scene, in theory, DALL-E 3 should give you what you asked for. We’ll be comparing DALL-E 3 and MidJourney using several prompts. The same prompt will be given to each AI generator.

Prompt 1: Artistic Flair

First, I just want to get a general feel for what each generator will do artistically, so we’ll start with a rather generic prompt:

Generate an image of an epic fantasy scene with elves and dragons in a 90s fantasy art style

Here’s the MidJourney image I thought was best.

MidJourney Dragon and Elves Fantasy Image
MidJourney / Sydney Butler / How-To Geek

And here’s the DALL-E 3 image I thought was best.

A fantasy scene generated by DALL-E 3 to resemble 90s fantasy art with dragons and elves
DALL-E / Sydney Butler / How-To Geek
 

What’s interesting to note here is that ChatGPT (the front end for DALL-E 3 in this case) does not pass my exact prompt on to the image generator. Part of the main selling point for DALL-E 3 is that it uses ChatGPT (i.e. GPT-4) to take your idea and do the “prompt engineering” part of the work for you. So it will create much more detailed prompts to try and get better results. Here’s the prompt that ChatGPT created based on my request:

Oil painting reminiscent of 90s fantasy artwork, showcasing a group of elves, both male and female, standing on a cliff's edge. In the background, colossal dragons soar, their wings casting shadows over a lush forest below. The scene is filled with vibrant colors and dramatic lighting.

This presents a unique challenge when trying to compare the two image generators, because GPT is increasing the quality of the prompt. So, to make it fair, I fed the GPT-generated prompt into MidJourney and this is the result.

MidJourney Dragon Fantasy Art with elves
MidJourney / Sydney Butler / How-To Geek

Now we have something much more comparable. However, which one wins? In this case, my opinion is that the DALL-E 3 image is closer to what I asked for, while the MidJourney image has a more distinct style and more artistic flair. MidJourney’s current V5 model excels at overall artistic flair in my opinion, but of course this is highly subjective.

For the rest of the comparisons, I will only be using the GPT-generated prompts for both image generators to cancel out my skill (or lack thereof) when it comes to crafting prompts. So in other words, I’ll ask ChatGPT for the image first, and then copy and paste the best image it generate’s prompt into MidJourney.

Prompt 2: Text Elements

You may have noticed that MidJourney tends to come up with gobbledygook whenever there’s text in a generated image. That’s because it’s generating stuff that looks like letters, but aren’t really letters. So T-shirts with text, or store signs won’t have any sensible text. DALL-E 3 promises to create whatever text you like and place it correctly in the frame, so let’s test that. Here’s the prompt ChatGPT came up with:

Drawing reminiscent of newspaper comic strips, featuring a computer geek deeply engrossed in his coding work. His T-shirt stands out with the bold statement 'How-To Geek Is Awesome'. The scene is set in a cozy corner with tech posters and sticky notes on the wall.

Here’s DALL-E 3’s result.

Cartoon of man sitting in front of computer with
DALL-E / Sydney Butler / How-To Geek
 

And here’s MidJourney’s result.

Midjourney image of Computer Geek
MidJourney / Sydney Butler / How-To Geek

While MidJourmey’s output is very pleasing to the eye, it’s not at all what we asked for, so DALL-E 3 pips it here. However, there’s still plenty of nonsensical text in the image. In my testing, DALL-E works great when you specify all the text in the image, or there’s no other text than what you asked for, but if the image has unspecified text it’s nonsense just as with MidJourney,

​​​​Prompt 3: Setting a Scene

The last test I want to run is setting a scene, where I specify the position of all the major elements.

Illustration of a cyberpunk cityscape reminiscent of Blade Runner aesthetics. A cyborg woman with glowing eyes and cybernetic limbs stands on the left, holding a shiny apple. Opposite her, on the right, a robot vendor with a worn-out exterior smokes a cigar, surrounded by an array of exotic fruits. The street is bustling with activity, with drones flying overhead and neon signs illuminating the scene.

Here’s DALL-E 3’s result.

Cyberpunk scene of robot woman buying a fruit from a robot man
DALL-E / Sydney Butler / How-To Geek

And here are all four attempts by MidJourney.

A grid of imaged from MidJourney depicting cyberpunk artwork
MidJourney / Sydney Butler / How-To Geek

Again, MidJourney excels at artistic flair but completely fails to actually do what I asked in the prompt.

While you can redo the same image in DALL-E 3 in different styles, no amount of cajoling will get MidJourney to consistently reproduce the specific elements and placement you ask for. Here’s the same image, but I asked for a more surreal and dreamlike style from DALL-E 3.

DALL-E 3 Isn’t Perfect

Before you decide to ditch MidJourney for DALL-E 3, there are a few major limitations I ran into when testing DALL-E 3 that you should know about:

  • ChatGPT will refuse to generate images of copyrighted characters, while MidJourney will happily produce fan art of existing characters.
  • ChatGPT also won’t let you ask for the art style of any living artist, while you can still do this with MidJourney.
  • Both platforms won’t generate art that steps over certain lines when it comes to adult content that’s violent or sexual in nature. However, MidJourney has a simple appeals process for false positives, whereas ChatGPT may take some convincing since it’s much more sophisticated on the face of it.

My time with the tool was limited, and both DALL-E 3 and MidJourney are constantly getting new tweaks and features, but these were the most apparent limitations that most people might care about.

The Verdict

It’s quite difficult to declare an absolute winner here, but as things stand, MidJourney is the right tool to use if you want expressiveness and artistic flair in what you generate. In contrast, DALL-E 3 is by far the better tool if you want to create consistent artwork to your exact requirements for illustrations or other professional use cases.

Source link