Inside Facebook’s artificial intelligence lab
Researchers are using our social posts to build thinking machines.
It’s time to stop thinking about Facebook as just a social media company. Between its efforts to deliver internet service with drones, buying Oculus for virtual reality, and its continued pursuit of artificial intelligence, Facebook has quickly become one of the most advanced technology research centers in the world.
It’s not alone: companies like Google and even IBM have similar schemes, and collectively, the developments across the field have accelerated to the point that artificial intelligences will surely shape the way humans interact with computers. In fact, they already do — but quietly, behind the curtains. Facebook has great interest in this technology, servicing 1.5 billion users monthly. The company tackles the problem of emulating general intelligence — that is, getting computers to think less like linear, logical machines, and like us free-form humans — with a multi-prong approach. While the Facebook Artificial Intelligence Research (FAIR) team works on solving generalized AI problems, smaller groups like Language Technology and Facebook M deploy practical features to users.
The birth of artificial intelligence research at Facebook
It all started in 2013. Facebook founder and CEO Mark Zuckerberg, chief technology officer Mike Schroepfer, and other company leadership were taking stock in the company’s accomplishments since launching almost a decade before, and looking to see what would allow them to thrive throughout the next 10 or 20 years.
Facebook had already been using machine learning on its hugely popular social network to decide what users would see on their News Feeds, but it was simple compared to the cutting-edge neural networks of the time.
Some Facebook engineers had also been experimenting with convolutional neural networks (CNNs), a powerful flavor of machine learning that is now popularly used for identifying images. Zuckerberg was impressed by the potential of artificial intelligence, even in its early stages, so he hired an engineer out of Google Brain, Marc’Aurelio Ranzato. Then, he went to the source: the inventor of CNNs, Yann LeCun.
Yann LeCun, who now serves as the director of FAIR, comes from a storied tenure of artificial intelligence research. He began his work in Bell Labs (founded by telephone father Alexander Graham Bell, and known for its experiments across myriad fields in telecommunications and technology) as a researcher starting in 1988, then moving to become a department head at AT&T Labs until developing 2003, when he began to teach at New York University. The modern convolutional neural network is a culmination of work throughout LeCun’s career. Ever wonder how an ATM can read your check? That was LeCun, whose early work included a neural network simulator called “SN” and deployed in 1996.
“I started talking with Schroepfer and Mark, and I guess they liked what I told them,” LeCun said in an interview with Popular Science. “And then they tried to convince me to run it…When someone like Mark comes to you and says ‘Oh, okay, you pretty much have carte blanche. You can put together a world-class research lab and I expect you to build the best research lab in AI in the world.’ I’ll say,’Hmm, interesting challenge.’”
Yann had some ideas about what that world-class research lab would entail. Like if you want to attract top talent, you have to have an ambitious research lab, with ambitious long-term goals. Then you give people some freedom on their work, and you have to be very open about your research. “It lined up with sort of the philosophy at Facebook, which is a philosophy of openness,” LeCun said.
Assembling The Team
The team subsequently tasked with creating the future of Facebook is a small, only about 30 research scientists and 15 engineers in total. Labor is divided over three branches: Facebook AI Research’s main office is in New York City’s Astor Place, where LeCun operates with a team of about 20 engineers and researchers. A similar number staffs the Menlo Park branch, and as of June, FAIR has opened a smaller Paris office of about 5 to collaborate with INRIA, the French Institute for Research in Computer Science and Automation. There are others that work within Facebook on AI deployment, like the Language Technology team; FAIR is the research arm.
These researchers and engineers come from all over the tech industry, and many have previously collaborated with LeCun. High-level artificial intelligence research isn’t an enormous field, and many of LeCun’s pupils have gone on to seed AI startups, which would be absorbed into larger companies like Twitter.
LeCun once told Wired that deep learning “is really a conspiracy between Geoff Hinton and myself and Yoshua Bengio, from the University of Montreal.” While Hinton works on AI at Google, and Bengio splits time between University of Montreal and data mining company ApStat, LeCun has been able to snag other top-shelf names.
“When I was first made a a department head at Bell Labs, my boss told me, “There’s only two things you need to remember: First, of all, never put yourself in competition with people in your group. Second, only hire people who are smarter than you,’” LeCun said.
Leon Bottou, who leads the research sub-group concerned with language, has been a longtime colleague of LeCun. They developed neural network simulators together, beginning in 1987 with AmigaOS. Bottou joined FAIR in March 2015, previously working for Microsoft Research while exploring machine learning and machine reasoning.
LeCun also brought Vladimir Vapnik onto the team as a consultant in November 2014; Vapnik and LeCun worked together at Bell Labs, publishing formative research on machine learning, including a technique to measure machine learning capacity. Vapnik is the father of statistical learning theory, which addresses the aspect of prediction based on established data. Prediction, which seems like a simple task for a human, actually draws on an immense library of preconceived notions and observations of the world. (But more on that later.) Vapnik, a leader in this field, continues his work with an interest in knowledge propagation, applying cues from teacher-student interaction to machine learning.
The size and academic weight of the team allows Facebook to be ambitious with their long-term goal, which doesn’t fall short of a system that LeCun would call “unambiguously intelligent.”
“Right now, even the best AI systems are dumb, in the way that they don’t have common sense,” LeCun said. He talks about a situation where I pick up a bottle, and leave the room. (We’re in a FB NYC conference room called Gozer the Gozerian — sharing the name of the Ghostbusters villain — an ominous name for a room to discuss the birth of true machine intelligence.) The human brain has no trouble imagining the entire simple scenario of someone picking up a bottle and leaving a room, but to a machine, huge swaths of information are missing based on that premise alone.
Yann says that as I imagined the situation in my mind, “You probably stood up, even though I didn’t say that in the sentence, you probably walked. You opened the door, you walked through the door, you closed the door maybe. The bottle is not in the room. I mean there are a lot of things you can deduce from that because you know the constraints of the real world. So I don’t have to tell you all those facts.”
The artificial intelligence community doesn’t know enough right now about the how machines learn to bring this level of inference. Stepping to achieve that goal, Facebook is focusing on building machines that can learn well enough to understand the world around them.
The biggest barrier, says LeCun, is what’s called “unsupervised learning.” Right now machines mainly learn in one or two ways: supervised learning, where the system is shown thousands of pictures of dogs, until it understands the attributes of a dog. This method is explained in Google’s DeepDream, where researchers reversed the process to reveal its efficacy.
The other is reinforcement learning, when the computer is shown information to identify, and is only given a “yes” or “no” answer on each decision it makes. This takes longer, but the machine is forced to make internal configurations, and can yield robust results when the two learning forms are married. (Remember DeepMind playing Atari?) Unsupervised learning requires no feedback or input. It’s how humans learn, LeCun says. We observe, draw inferences, and add them to our bank of knowledge. That’s proven to be a tough nut to crack.
“We don’t even have a basic principle on which to build this. We’re working on it, obviously,” LeCun says, and laughs. “We have lots of ideas, they just don’t work that well.”
Early Progress Toward A Truly Intelligent AI
But that’s not to say that there hasn’t been progress made. Right now, LeCun is excited about work on a “memory” network that can be integrated into present convolutional neural networks, giving them the ability to retain information. He likens the new mode of memory retention to short term and long term memory in the brain, governed by the hippocampus and cerebral cortex respectively. (LeCun actually detests CNNs being compared to brains, instead preferring a model of a black box with 500 million knobs.)
The memory module allows researchers to tell the network a story, and then have it answer questions about the story later.
For the story, they used J.R.R. Tolkein’s Lord of the Rings Well, not the entire book, but short summaries of major plot points. (“Bilbo took the ring.”) When asked questions about where the ring was at certain points in the story, the AI would be able to answer in short, correct answers. This means it “understands” relationships between objects and time, according to CTO Mike Schroepfer, who stressed this technology’s ability to help Facebook show you what you want to see with higher accuracy.
“By building systems that understand the context of the world, understand what it is you want, we can help you there,” Schroepfer said at a developer presentation in March. “We can build systems that make sure all of us spend time on the things we care about.”
The FAIR team is developing this context around a project called “Embed the World.” To help machines better understand reality, the FAIR team is teaching them to represent the relationships between everything in vectors: images, posts, comments, photos, and video. The neural network is creating an intricate web of content that groups like pieces of media, and distances different ones. There’s a helpful video to visualize this:
With this system, LeCun says that we can start to “replace reasoning with algebra.” And it’s incredibly powerful. The artificial neural networks developed in the Embed the World project can link two photos that were taken in the same location based on visual similarities in the photos, but also figure out if text describes the scene. It’s recreating a virtual memory of reality, and clustering it in the context of other places and events. It can even “virtually represent a person,” based on their previous likes, interests, and digital experiences. This is somewhat experimental, but has great implications for Facebook’s News Feed and is used in a limited way to track hashtags.
There’s a lot of talk about long-term goals, but small victories along the way have made Facebook incrementally smarter. In June 2014, they published an article titled “DeepFace: Closing the Gap to Human-Level Performance in Face Verification,” which claimed more than 97 percent accuracy in recognizing faces. LeCun says that he’s confident Facebook’s facial recognition is the best in the world, and that it’s a key difference between Facebook and academic research institutions. Now, DeepFace is driving force behind Facebook’s automatic photo tagging.
“If we have an idea that actually works, within a month it can be in front of 1.5 billion people,” LeCun said, “Lets keep our eyes focused on the horizon, where our long-term goal is, but on the way there are a lot of things that we’re going to build that are going to have applications in the short term.”
Rob Fergus, a veteran of NYU and MIT’s Computer Science and Artificial Intelligence Lab, leads the AI research team concerned with vision. His team’s work that can already been seen in the automatic tagging of photos, but Fergus says the next step is video. Lots of video is “lost” in the noise because of a lack of metadata, or it’s not accompanied by any descriptive text. AI would “watch” the video, and be able to classify video arbitrarily.
This has major implications for stopping content Facebook doesn’t want from getting onto their servers—like pornography, copyrighted content, or anything else that violates their terms of service. It also could identify news events, and curate different types of video category. Facebook has traditionally farmed these tasks out to contracted companies, so this could potentially play a role in mitigating costs.
In current tests, the AI shows promise. When shown a video of sports being played, like hockey, basketball or table tennis, it can correctly identify the sport. It can tell baseball from softball, rafting from kayaking, and basketball from street ball.
The AI Behind Facebook
A separate group within Facebook, called Language Technology, focuses on developing translation, speech recognition, and natural language understanding. FAIR, LeCun’s realm, is the research arm of Facebook’s AI push, and Language Technology (under the umbrella of Applied Machine Learning) is one of the places that actually deploys the software.
They collaborate with FAIR, but stand alone in their development and deployment, and their work has developed 493 active-used translation directions (English to French and French to English count as two directions).
With Facebook’s creed to make the world more open and connected, language services is a natural route. More than half of users don’t speak English, but English makes up most of the content of Facebook, says Language Technology head Alan Packer.
There are 330 million people using these translation services, which are most often accessed by clicking the “See Translation” button. If you’ve been the first person to click the translation button, congratulations, you’ve operated artificial intelligence. The first click initiates the translation request to the server, which is then cached for other users. Packer says that Shakira’s posts are translated almost instantly. The team is also rolling out native translation of content, which will display a “See the original” button.
Artificial intelligence is necessary in this role because “dumb” translation is ineffective in relating how humans interact with each other. It generates improper syntax, misunderstands idioms, and has no reference for slang. This is a flaw with direct, word-to-word translation like the Google Translate of yore.
Packer says that figures of speech are particularly difficult, but something an AI that understands underlying semantic meaning would catch.
“The phrase ‘hot dog,’ if you just translate those words literally into French, it’s not going to work. ‘Chaud chien’ means nothing to a French person,” Packer said. “And then if you have a picture of me skiing and I say, ‘I’m hot dogging it today,’ that turns out to be really hard to learn, that hot dogging means showing off.”
This understanding isn’t at scale yet, but early results are promising that it’s not an insurmountable task. Packer says that the trick isn’t understanding metaphors or idioms, but realizing when not to understand them as well.
The AI is adaptive be nature, and can be trained on slang quickly. The Language Technology team recently learned that French soccer fans were using a new form of slang to say “wow,” and after training the neural network on that public data, it can now reliably translate that text. They’re working now to grow Facebook’s lexicon by training on new data every day, but all languages are now updated monthly.
We’re used to digital personal assistants by now, like Siri, Cortana, and Google Now. But Facebook took a different approach with its new AI personal assistant, M, which offers is the ability to execute complex tasks outside of the confines of your phone. Siri can send a text, but M can book a flight and make travel plans. During the development process, a Facebook employee even got M to schedule a series of in-home appraisals with moving companies. (You can’t buy tobacco, alcohol, escorts, or guns with M, though.)
The backbone of Facebook M actually comes from a startup acquired earlier this year, Wit.ai. They joined the Messenger team under VP David Marcus, and earlier this month debuted M.
Alex LeBrun, who leads the Wit.ai team within Facebook, says that artificial intelligence not only makes M better for accomplishing generalized tasks, but also for cases with very special exceptions, like traveling with an infant or during blackout dates. It also means that as AI grows, so does M’s capabilities. He’s hopeful that in even three years, M will be able to call the cable company or DMV and wait on hold for users.
“The true added value of a service like M is to be able to fulfill your request even if it’s a little bit specific or weird,” LeBrun says, ”It will do it even if it’s complex and not the mainstream case.”
And M learns as it goes along. Right now, it’s not robust enough to stand alone. A team of “AI trainers” works with the program, and if there’s a request that M doesn’t understand the trainers take over. M then learns from what the human trainer does, and can use that technique with later requests. There’s also an element of randomness built into the program, LeBrun says, to bring it closer to human learning. This means that it will sometimes try to find novel, more efficient ways to do a common task.
“AI trainer” is a new position, and one that even Facebook is still trying to figure out. They do say, however, that it’s not a job for researchers and engineers, but instead more geared for people with customer service experience. As time goes on, Facebook will be able to evaluate how many requests require human interference, but the eventual hope is that humans won’t be needed at all in the future.
These are essential to the development process, though, because their job is twofold: serve as a the last line of defense for quality control, and teach the AI.
And with human intelligence as the gatekeeper, M can be used as a sandbox for FAIR’s development. “As soon as they have something to test, it will surface in M, because with our training and supervision, it’s really risk-free,” LeBrun says.
The M platform is built entirely on Wit.ai’s platform (mainly developed before Facebook), but FAIR also will be using the deep learning data gathered from users interacting with the personal assistant AI.
Facebook In The Community
“The research we do, we’re doing it in the open. Pretty much everything we do is published, a lot of the code we write is open-sourced,” LeCun says. Those publications are available on Facebook’s research site, and also ArXiv, a library of research papers in computer science, mathematics, and physics.
This goes for a lot of the artificial intelligence community. LeCun has been a leading figure in developing Torch, a C++ library for AI development. Along with the rest of the team at Facebook, he works with researchers at Twitter and Google’s DeepMind to make Torch a better tool for everyone. (Many of these experts now in the field were once students of LeCun, as well.)
Anything else they might publish, from work that could be integrated in medical imaging or self-driving cars, is open to be used to further the field, LeCun says. The work that Facebook does is important to Facebook users, but at its core the research team strides towards furthering humanity’s collective knowledge of how to better emulate intelligence with machinery.
This is why Facebook is an important part of the artificial intelligence community, and why the community itself is so important.
“The scenario you seen in a Hollywood movie, in which some isolated guy in Alaska comes up with a fully-functional AI system that nobody else is anywhere close to is completely impossible,” LeCun said, “This is one of the biggest, most complicated scientific challenges of our time, and not any single entity, even a big company can solve it by itself. It has to be a collaborative effort between the entire research and development community.”