How a baby with a headcam taught AI to learn words

Artificial intelligence researchers were able to successfully create a machine learning model capable of learning words using footage captured by a toddler wearing a headcam. The findings, published this week in Science, could shed new light on the ways children learn language and potentially inform researchers’ efforts to build future machine learning models that learn more like humans.

Previous research estimates children tend to begin acquiring their first words around 6 to 9 months of age. By their second birthday, the average kid possesses around 300 words in their vocabulary toolkit. But the actual mechanics underpinning exactly how children come to associate meaning with words remains unclear and a point of scientific debate. Researchers from New York University’s Center for Data Science tried to explore this gray area further by creating an AI model that attempted to learn the same way a child does.

To train the model, the researchers relied on over 60 hours of video and audio recordings pulled from a light head camera strapped to a child named Sam. The toddler wore the camera on and off starting when he was six months old and ending after his second birthday. Over those 19 months, the camera collected over 600,000 video frames connected to more than 37,500 transcribed utterances from nearby people. The background chatter and video frames pulled from the headcam provides a glimpse into the experience of a developing child as it eats, plays, and generally experiences the world around them.

Armed with Sam’s eyes and ears, the researchers then created a neural network model to try and make sense of what Sam was seeing and hearing. The model, which had one module analyzing single frames taken from the camera and another focused on transcribed speech direct towards Sam, was self-supervised, which means it didn’t use external data labeling to identify objects. Like a child, the model learned by associating words with particular objects and visuals when they happened to co-occur at the same time.

Testing procedure in models and children. *Credit: Wai Keen Vong*

“By using AI models to study the real language-learning problem faced by children, we can address classic debates about what ingredients children need to learn words—whether they need language-specific biases, innate knowledge, or just associative learning to get going,” paper co-author and NYU Center for Data Science Professor Brenden Lake said in a statement. “It seems we can get more with just learning than commonly thought.”

Researchers tested the model the same way scientists evaluate children. Researchers presented the model with four images pulled from the training set and asked it to pick which one matches with a given word like “ball” “crib” or “tree.” The model was successful 61.6% of the time. The baby cam-trained model even approached similar levels of accuracy to a pair of separate AI models that were trained with many more language inputs. More impressive still, the model was able to correctly identify some images that weren’t included in Sam’s headcam dataset, which suggests it was able to learn from the data it was trained on and use that to make more generalized observations.

“These findings suggest that this aspect of word learning is feasible from the kind of naturalistic data that children receive while using relatively generic learning mechanisms such as those found in neural networks,” Lake said.

In other words, the AI model’s ability to consistently identify objects using only data from the head camera shows how representative learning, or simply associating visuals with concurrent words, does seem to be enough for children to learn and acquire a vocabulary.

Findings hint at an alternative method to train AI

Looking to the future, the NYU researchers’ findings could prove valuable for future AI developers interested in creating AI models that learn in ways similar to humans. The AI industry and computer scientists have long used human thinking and neural pathways as inspiration for building AI systems.

Recently, large language models like OpenAI’s GPT models or Google’s Bard have proven capable of writing serviceable essays, generating code, and periodically botching facts thanks to an intensive training period where the models inject trillions of parameters worth of data pulled from mammoth datasets. The NYU findings, however, suggest an alternative method of word acquisition may be possible. Rather than rely on mounds of potentially copyright protected or biased inputs, an AI model mimicking the way humans learn when we crawl and stumble our way around the world could offer an alternative path towards recognizing language.

“I was surprised how much today’s AI systems are able to learn when exposed to quite a minimal amount of data of the sort a child actually receives when they are learning a language,” Lake said.

How a baby with a headcam taught AI to learn words

Findings hint at an alternative method to train AI

AI trained on AI churns out gibberish garbage AI trained on AI churns out gibberish garbage

Everyone is judging AI by these tests, but experts say they’re close to meaningless Everyone is judging AI by these tests, but experts say they’re close to meaningless

Will AI render programming obsolete? Will AI render programming obsolete?

The logic behind AI chatbots like ChatGPT is surprisingly basic The logic behind AI chatbots like ChatGPT is surprisingly basic

How AI-powered brain implants are helping an ALS patient communicate How AI-powered brain implants are helping an ALS patient communicate

How do chatbots work? How do chatbots work?

Watch what happens when AI teaches a robot ‘hand’ to twirl a pen Watch what happens when AI teaches a robot ‘hand’ to twirl a pen

OpenAI chaos explained: What it could mean for the future of artificial intelligence OpenAI chaos explained: What it could mean for the future of artificial intelligence

AI narrators will read classic literature to you for free AI narrators will read classic literature to you for free

AI will require even more energy than we thought AI will require even more energy than we thought

AI trained on AI churns out gibberish garbage AI trained on AI churns out gibberish garbage

ChatGPT has been generating bizarre nonsense (more than usual) ChatGPT has been generating bizarre nonsense (more than usual)

Why AI could be a big problem for the 2024 presidential election Why AI could be a big problem for the 2024 presidential election

The best free language learning apps for building your vocabulary and conversational skills The best free language learning apps for building your vocabulary and conversational skills

Meta just released a tool that helps computers ‘see’ objects in images Meta just released a tool that helps computers ‘see’ objects in images

Will we ever be able to trust health advice from an AI? Will we ever be able to trust health advice from an AI?

Google announces Gemini, its ‘multimodal’ answer to ChatGPT Google announces Gemini, its ‘multimodal’ answer to ChatGPT

The New York Times is the latest to go to battle against AI scrapers The New York Times is the latest to go to battle against AI scrapers