The Dall-E Mini image generator’s ridiculousness might be its main appeal
The AI-powered system isn't fooling anyone. The ones that do represent the tech to be concerned about.
Salvador Dalí, the Spanish artist born in 1904, is known for his surrealist paintings: melting clocks, elephants with insect-thin legs, distorted human faces and limbs. Dall-E (pronounced like Dalí) Mini, a new AI program that debuted in early June, is quickly being recognized as another source of surrealist art, producing images through user-generated requests like a bottle of ranch testifying in court, wikihow instructions on how to eat a hammer, and a nurse from the movie “Silent Hill” eating a pizza.
The images are entertaining, and the program is gaining traction online for its funny art. Users can type in a phrase—any phrase—and then see the AI-generated image they have written burst into creation. Dall-E Mini is an open-source project based on the original Dall-E technology from OpenAI, an AI research laboratory, which generates realistic images and art from text. Much of the art that Dall-E Mini has produced has been received with laughter. But as technologies like these become more refined and widely used, the risk of misuse increases—and that’s no laughing matter.
Dall-E Mini followed Dall-E 2, which OpenAI officially released in April 2022. Dall-E 2 operates by building associations between billions of online images and their accompanying descriptions. Dall-E Mini operates similarly, but was created on a much smaller scale. The project, led by Boris Dayma, used about 15 million images from three data sets to inform their model, and is roughly 27 times smaller than OpenAI’s original Dall-E program. Dall-E Mini is hosted on Hugging Face, a company that provides machine-learning models and tools and says they’re on a mission to “democratize good machine learning.”
Users can pretty much make any image they want, though the results lean more towards comical than accurate. Shuman Ghosemajumder, former head of AI at technology security company F5, says that part of the comedy and appeal comes from the unexpected imperfections. “The reason that it’s fun is partially because the images aren’t perfect,” Ghosemajumder says. “There are impressionistic images, there are kind of nightmarish images, there are crazy looking images. All of that is interesting to look at and it’s fun to share.”
Ghosemajumder says he can see this kind of technology going through a few different phases under the public eye. At first, people are curious about the capacity a program like this might have. There’s wonder and learning as people explore the complexity of what they can create—as has been shown already online. Then, he says the second phase is more of a transition from pure entertainment to people understanding the value of what they can produce.
“You can generate rudimentary illustrations for a particular purpose and you can understand the current state of the technology better, so that you can start to make plans for how you want to use more advanced versions of this technology in the future,” Ghosemajumder says.
Looking forward, he imagines that the third phase of how these technologies are perceived would be how they lay the foundation for even more advanced innovations in the future, like high quality synthetic videos.
But as these innovations get better, the risk of dangerous and misleading images increases.
Right now, Ghosemajumder says, Dall-E Mini’s images are “low-quality” enough that users typically know that they are AI-generated and not necessarily an actual picture taken of, say, a bottle of ranch testifying in court. But when it becomes more difficult to identify whether something was made on an AI platform versus captured in the non-digital world, it will be easier for people to create images that feed into misinformation campaigns online.
“It won’t just be an impressionistic version of Tupac or Darth Vader that people can create,” Ghosemajumder says. “Instead it’s going to be like, ‘wow, this looks like a real person doing something.’ I could create an image of a politician doing something that they never did, and eventually create an entire storyline and use that to disseminate disinformation.”
For now, Dall-E 2 is not yet available to the public, while OpenAI’s developers wrestle with the potential risks of misuse. (Dall-E Mini is the similar but publicly produced, open-source version.) Similarly, Imagen, Google’s text-to-image program, has not released access to their platform. In both cases, the companies have hired researchers and artists to test the programs and improve operations and training data, ensure safety, and make the art better. Having a limited release before a general release is a “widely accepted software principle,” Ghosemajumder says, which he thinks will be increasingly important as technology becomes more complex.
Ultimately, one of the greatest strengths that technologies like Dall-E Mini offer is that they will enable more people to visualize things more easily than they can now, Ghosemajumder says. He sees it as the democratization of high-quality content.
“It unlocks people’s creativity and allows people to communicate more effectively,” Ghosemajumder says. “This has the potential to make people a lot more efficient and effective in generating illustrations and photos of different concepts that they want to be able to visualize.”
As for Ghosemajumder, his Dall-E Mini creation was a series of images depicting how Chewbacca would go about his day—if he had an office job.