AI text-to-image generators have come a long, arguably troubling way in a very short period of time, but there’s one piece of human anatomy they still can’t quite grasp: hands. Speaking with BuzzFeed earlier this year, Amelia Winger-Bearskin, an artist and associate professor of AI and the arts at the University of Florida, explained that until now, AI programs largely weren’t sure of what a “hand” exactly was. “Hands, in images, are quite nuanced,” she said at the time. “They’re usually holding on to something. Or sometimes, they’re holding on to another person.” While there have been some advances in the past few months, there’s still sizable room for improvement.
Although that might sound odd at first, a quick look at our appendages’ complexities can quickly reveal why this is the case. Unless one can nail numerous points of articulation, varieties of poses, skin wrinkles, veins, and countless other precise details, renderings of hands can rapidly devolve into an uncanny valley of weirdness and inaccuracy. What’s more, AI programs simply don’t have as many large, high-quality images of hands to learn from as they do faces and full bodies. But as AI still contends with this—often to extremely puzzling, ludicrous, and outright upsetting results—programmers at the University of Science and Technology in Hefei, China, are working on a surprisingly straightforward solution: train an AI to specifically study and improve hand generation.
In a recently published research paper, the team details how they eschewed the more common diffusion image production technology in favor of what are known as neural radiance fields, or NeRFs. As New Scientist notes, this 3D modeling is reliant on neural networks, and has previously been utilized by both Google Research and Waymo to create seamless, large-scale cityscape models.
“By introducing the hand mapping and ray composition strategy into [NeRF,] we make it possible to naturally handle interaction contacts and complement the geometry and texture in rarely-observed areas for both hands,” reads a portion of the paper’s abstract, adding that the team’s “HandNeRF” program is compatible with both single and two interacting hands. In this updated process, multi-view images of a hand or hands are initially used by an “off-the-shelf skeleton estimator” to parameterize hand poses from the inside. Researchers then employ deformation fields via the HandNeRF program, which generates image results of our upper appendages that are more lifelike in shape and surface.
Although NeRF imaging is difficult to train and can’t generate whole text-to-image results by itself, New Scientist also explains that potentially combining it with diffusion tech could provide a novel path forward for AI generations. Until then, however, most programmers will have to figure out ways to work around AI’s poor grasp—so to speak—of the human hand.