About 15 months ago, OpenAI—famed for its eerily effective GPT-3 large language model—introduced a child system to that language model: the cleverly named “DALL·E,” a 12-billion parameter neural network that generates images from provided prompts. Now, OpenAI has introduced a new version of DALL·E. DALL·E 2 promises higher resolution, better caption-matching, improved photorealism, and reduced harmful outputs.
First, the user provides a text description of the image they want: a combination of concepts, attributes, and styles. DALL·E 2’s website offers a simple illustration of the process: a user could, for instance, ask for an astronaut riding a horse as a pencil drawing; or, if they were feeling more adventurous, they could ask for a bowl of soup that is a portal to another dimension drawn on a cave wall.
DALL·E 2, like its predecessor, was trained with a large dataset of captioned images. Using this understanding, DALL·E 2 then generates an image to the best of its ability to match the provided caption—and as seen above, the results are breathtakingly accurate. DALL·E 2 can also edit existing images given a caption (mimicking the shadows, lighting, and textures of the original image) or even create variations “inspired” by an original image.
Compared to DALL·E 1, DALL·E 2 offers 4× greater resolution. OpenAI also reports that the successor is 71.7% preferred for caption-matching and 88.8% preferred for photorealism.
However, as with any generative model like GPT-3 or DALL·E 2, the unfortunate reality exists that sometimes, the model will produce outputs that mirror biases found in the training dataset or which are otherwise harmful. “Without sufficient guardrails, models like DALL·E 2 could be used to generate a wide range of deceptive and otherwise harmful content, and could affect how people perceive the authenticity of content more generally,” OpenAI wrote on GitHub. “DALL·E 2 additionally inherits various biases from its training data, and its outputs sometimes reinforce societal stereotypes.”
To combat this, DALL·E 2—which has a content policy that forbids violent, adult, or political content, among other categories—was trained with a new dataset that excluded “the most explicit content.” OpenAI has also been working with select early users for over a month to identify other areas for improvement. Over the course of that time, those users have created more than three million images, and OpenAI says that less than 0.05% of downloaded or publicly shared images were flagged for potential content policy violations, with 30% of those confirmed as policy violations.
DALL·E 2 is still in that early testing phase, but some of the tool’s creations can be found on its Instagram, along with their associated captions.