Generative AI programs have gotten better and better at constructing impressively detailed visual images from text inputs, but researchers at Japan’s Osaka University have taken things a major step forward. They enlisted AI to reconstruct accurate, high-resolution images from humans’ brain activity generated while looking at images in front of them.

[Related: A guide to the internet’s favorite generative AIs.]

As recently highlighted by Science and elsewhere, a team at Osaka ’s Graduate School of Frontier Biosciences’ new paper details how they utilized Stable Diffusion, a popular AI image generation program, to translate brain activity into corresponding visual representation. Although there have been many previous, similar thought-to-computer image experiments, this test is the first to employ Stable Diffusion. For additional system training, researchers linked thousands of photos’ textual descriptions to volunteers’ brain patterns detected when viewing the pictures via functional magnetic resonance imaging (fMRI) scans.

AI photo
Stable Diffusion recreated images seen by humans (above) after translating their brain activity (below) Credit: Graduate School of Frontier Biosciences

Blood flow levels fluctuate within the brain depending on which areas are being activated. Blood traveling to humans’ temporal lobes, for example, helps with decoding information about “contents” of an image, i.e. objects, people, surroundings, while the occipital lobe handles dimensional qualities like perspective, scale, and positioning. An existing online dataset of fMRI scans generated by four humans looking at over 10,000 images was fed into Stable Diffusion, followed by the images’ text descriptions and keywords. This allowed the program to “learn” how to translate the applicable brain activity into visual representations.

[Related: ChatGPT is quietly co-authoring books on Amazon.]

During the testing, for example, a human looked at the image of a clock tower. The brain activity registered by the fMRI corresponded to Stable Diffusion’s previous keyword training, which then fed the keywords into its existing text-to-image generator. From there, a recreated clock tower was further detailed based on the occipital lobe’s layout and perspective information to form a final, impressive image.

As of right now, the team’s augmented Stable Diffusion image generation is limited only to the four person image database—further testing will require additional testers’ brain scans for training purposes. That said, the team’s groundbreaking advancements show immense promise in areas such as cognitive neuroscience, and as Science notes, could even one day help researchers delve into how other species perceive the environments around them.