Adobe’s new AI can turn a 2D photo into a 3D scene
A sneak preview of "Beyond the Seen."
Today at Adobe MAX, the company’s annual creativity conference, Adobe will preview a new technology called “Beyond the Seen” that uses artificial intelligence to extend the boundaries of two-dimensional images and even turn them into immersive three-dimensional scenes. While just a demonstration, it shows how AI image generators designed for specific purposes could have far reaching commercial and artistic applications.
The image generator works by taking a landscape or photograph from inside a building and expanding it into a full 360-degree spherical panorama around the camera. Of course, it can’t know what’s actually behind the camera, so it uses machine learning to create a plausible and seamless environment—whether the input image is of a mountain landscape or the interior of a concert hall. Adobe’s algorithms can also estimate the 3D geometry of the new environment, which enables the view point to be changed, and even for the camera to appear to move around the environment.
While image extension or out-painting isn’t new, Adobe’s AI generator is the first to be built exclusively around it. For example, DALL-E 2 allows users to extend their images in small blocks, while Stable Diffusion requires a work around.
Adobe’s AI image generator is a little different from more general image generators like DALL-E 2 and Stable Diffusion in a couple of key ways. First, it’s trained on a much more limited dataset with a specific purpose in mind. DALL-E 2 and Stable Diffusion were trained on billions of text-image pairs that cover every concept from avocados and Avril Lavigne, to zebras and Zendaya. Adobe’s generator was trained exclusively on a dataset of roughly 250,000 high-resolution 360-degree panoramas. This means it’s great at generating realistic environments from seed images, but it has no text-to-image features (in other words, you can’t enter a text prompt and get a weird result) or any other general generation features. It’s a tool with a specific job. However, the images it outputs are significantly larger.
Adobe’s generator currently uses an artificial intelligence technique called a General Adversarial Network, or GAN, and not a diffusion model. GANs work by using two neural networks against each other. The Generator is responsible for creating new outputs, and the Discriminator has to guess whether any image it is presented with is an output from the Generator or an actual image from the training set. As the Generator gets better at creating realistic images, it gets better at fooling the Discriminator, and thus a functioning image generation algorithm is created.
Meanwhile, diffusion models, which DALL-E 2 and Stable Diffusion use, start with random noise and edit it to create a plausible image. Recent research has shown that they can produce more realistic results than GANs. Given that, Gavin Miller, VP and Head of Adobe Research, tells PopSci the algorithm could be adapted to use a diffusion model before it was commercially released.
Although this is still in early development, Adobe has highlighted a couple of potential uses for the technology. While there are the claims about the Metaverse and generating 3D worlds from 2D snapshots, it’s the regular image extension features that are likely to prove valuable first. One example Adobe demonstrated in the demo video is how its algorithm allowed for “specular” (or shiny) rendered objects to be inserted into an image. The AI generator was used to extrapolate what could be behind the camera and above the object in order to create realistic reflections off of that shiny object. This is the kind of thing that would allow architects and interior designers to more easily create accurate-seeming renderings for their projects.
Similarly, it would allow photographers and videographers to expand the background of their images in a more natural way. Miller explained that the content aware tools, which have been in Adobe’s apps like Photoshop since 2010, are able to generate naturalistic texture, while the new generative models are capable of creating both texture and structure.
While there is no word yet on when this technology will be available to the public, revealing it today is all “part of a larger agenda towards more generative technologies,” that Adobe is pursuing, Miller says. It’s always been possible to create 360-degree panoramas with hardware, but soon it will be possible to create realistic seeming ones using just software. And that really could change things—and yes, maybe make it possible for small creators to make metaverse-adjacent experiences.