Natural history museums offer amazing portals into worlds miles away from our own, and into eras from the distant past. Comprised of fossils, minerals, preserved specimens, and much more, some collections are of palatial grandeur. Although every museum has some sort of system in place to track incoming and outgoing items, those systems are not connected, museum to museum. Keeping a more detailed record of who has what across the world could not only be important for conservation, but for cataloging how life on Earth has changed, and forecasting how it will continue to do so in the future. 

For example, there are case studies showing how analyzing the collections of these museums can be useful for studying pandemic preparedness, invasive species, colonial heritage, and more. 

But this lack of connection might be a thing of the past. A paper published in the journal Science last week describes how a dozen large museums came together to map the entire collections of 73 of the world’s largest natural history museums across 28 countries in order to figure out what digital infrastructure is needed to establish a global inventory survey. 

“There is no single shared portal covering the breadth of life, Earth, and anthropological specimens in natural history collections, nor a way for researchers to link these data with other sources of information,” the researchers wrote in the paper. “We envision a coordinated strategy for the global collection that is based on strategic collecting, increased digitization, new technologies, and enhanced networking and coordination of museums.” 

[Related: Why ocean researchers want to create a global library of undersea sounds]

Although recent tech advances in fields like isotopic identification, imaging, genomic analysis, and machine learning are making it easier to access information related to the collections, in order to have a shared portal of sorts, the team found that they needed to first work through some logistical kinks. 

Why researchers surveyed more than 1.1 billion objects across 73 museums
The exhibition “digitize!” offers a unique look at how the museum’s collections are imaged and digitized. Thomas Rosenthal / Museum für Naturkunde Berlin

“Until now, it has been difficult to enumerate or compare the complete contents of large museums because their collections are not fully digitized, and the terminology used to describe subcollections is variable,” they wrote. This, they think, is due to the fact that most museums operate independently, and do not have the data structure needed to provide open access to the outside.

“Most of the collection information that we surveyed is not digitally accessible: Only 16% of the objects have digitally discoverable records, and only 0.2% of biological collections have accessible genomic records,” they added.

Why researchers surveyed more than 1.1 billion objects across 73 museums
Scanning the barcode attached to the insect specimen links to a digital copy. Nico Garstman / Naturalis Biodiversity Center

Therefore, for the purpose of this survey, they came up with a common vocabulary for the types of objects, and the collections or geographic source areas they can be categorized into. They ran this methodology with all 73 museums, going through 1,147,934,687 specimens in total. You can see the breakdown in an online dashboard the team created. 

[Related: Open data is a blessing for science—but it comes with its own curses]

The survey found that while there was a vast diversity of items spanning areas of study from biology, geology, paleontology, anthropology, there were also “conspicuous gaps across museum collections in areas including tropic and polar regions, marine systems, and undiscovered arthropod and microbial diversity,” they noted. “These gaps could provide a roadmap for coordinated collecting efforts going forward.”

This is not to say that museums have been totally in the dark with their data. In fact, the survey organizers brought attention to several existing networks that have been established to integrate biodiversity data around the world, including the Global Biodiversity Information Facility, the Earth BioGenome Project, and the International Barcode of Life, just to name a few. 

Plus, they lauded programs like the Atlas of Living Australia and Integrated Digitized Biocollections for coming up with “innovative solutions to support collection digitization, data integration, and mobilization.” Having more readily available datasets makes it easier for others to work with them to look for patterns, or build tools and models that are helpful for the scientific community at-large.