This week, Meta launched a new artificial intelligence model, called Sphere, which is designed to automatically verify Wikipedia citations. Sphere’s knowledge base comes from 134 million web pages.
Meta said that it is not partnering with Wikimedia (the non-profit organization that owns wikipedia.com) on this project, which is still in the research phase and is not being used to push live updates on Wikipedia. However, Wikimedia announced recently that it was using Meta’s technology in its Content Translation Tool.
Sphere, Meta says in a blog post, is an AI model that performs knowledge-intensive natural language processing, the same task the virtual assistant on your phone does when you ask it a question like “who won the first Nobel Prize in physics?” These models will then dig through a repository to find a matching answer.
In Sphere’s case, it uses information from the “unstructured” open web as opposed to a search engine. “Because Sphere can access far more public information than today’s standard models, it could provide useful information that they cannot,” Meta researchers wrote in a blog post. Additionally, Meta’s system uses natural language understanding to “estimate the likelihood that a claim can be inferred from a source.” This technique breaks down sentences or phrases into mathematical representations, and then compares sets of representations to each other.
[Related: Meta wants to improve its AI by studying human brains]
A preprint describing Sphere can be found on arXiv, and the software itself is open-sourced on GitHub. Meta has also created a benchmark test called KILT that it will use to assess how Sphere and other similar models perform on a broad range of tasks like fact-checking, Q&A dialogue, and inserting relevant links.
This ability has so far only been put to work scanning and checking Wikipedia citations. “It calls attention to questionable citations, allowing human editors to evaluate the cases most likely to be flawed without having to sift through thousands of properly cited statements,” Meta explained. “If a citation seems irrelevant, our model will suggest a more applicable source, even pointing to the specific passage that supports the claim.”
[Related: ‘Adopting typos’ and other ways to edit Wikipedia]
Ultimately, learning to understand the relationship between the text passages in Wikipedia entries and the links that they cite will also improve the model’s ability to parse real-world knowledge, since citation editing requires a firm grasp of human language comprehension and reasoning.
“These models are the first components of potential editors that could help verify documents in real time. In addition to proposing citations, the system would suggest auto-complete text — informed by relevant documents found on the web — and offer proofreading corrections,” Meta said. “Ideally, the models would understand multiple languages and be able to process several types of media, including video, images, and data tables.”