Bringing Biodiversity Data Online, One Leaf At A Time

On Oct. 6, 1846, prisoner of war Friedrich Adolph Wislizenus ventured into the rocky hills near the secluded Mexican town of Cusihuiriachi to collect some plants. He and his westward exploration party had been captured as the Mexican-American War broke out, but the St. Louis-based doctor and naturalist decided to continue his research during his imprisonment. He pulled up a crimson wildflower, henceforth named Heuchera sanguinea, and hiked back to the village.

He arranged the spindly plant so that once it was dried and mounted on paper, both sides of its leaves would be visible to future botanists, and then he pressed it between sheets of newsprint. After he was freed in the spring of 1847, he carried the dried plant and many others back to St. Louis, presenting it to his friend and fellow doctor Georg Engelmann. And this is where the crisp, rust-colored sample remains — a piece of botanical history, but also an important piece of data, one of almost 6.3 million specimens stored in a forest of manila folders at the Missouri Botanical Garden.

Click to launch the photo gallery

“Biodiversity science isn’t often looked at as data-intensive, or as Big Data. But what we have is base data for all other science,” says Chris Freeland, director of the Center for Biodiversity Informatics at MBG. “These little plants, these pieces of information, are part of the knowledge ecosphere that we have about life on Earth, and we need to know everything we can about all of them.”

Aiming to share their specimens with the world, a team of archivists and botanists are painstakingly photographing every single one of them and putting them online for anyone to access.

The Missouri Botanical Garden is one of the world’s largest repositories of data about plants, data that exists in several forms. It owns rare first-edition books, including medicinal plant manuals dating to the time of Gutenberg; thousands of living plants, which grow for public enjoyment on 79 acres in the center of urban St. Louis; and the 6 million-plus dried specimens — the unseen garden — which are used for studying the morphology, distribution and use of plant species worldwide. Along with samples collected by Wislizenus and his peers, the garden has collections obtained by Charles Darwin, James Cook and nearly every naturalist to travel westward in the footsteps of Lewis and Clark. All of this data, from books to scanned images to specimen labels, will be online within a few years, Freeland says. The garden is even building API tools so others can write apps to mine it all.

Jim Solomon is the curator of MBG’s herbarium, the name for a collection of permanently preserved plant parts. Towers of 12×16-inch manila folders, each holding specimens like H. sanguinea, crowd every surface of his office — some are tied together with red string, while others lay open with multiple plant pieces poking through. The prodigious piles are a fraction of MoBot’s overall collection, he says. Among the 37 total herbaria worldwide, there are probably 400 million such folders, he estimates. “Out of the total number of individual plants that have lived on the Earth in the past couple of centuries, that is vanishingly small,” he says. “Yet that is the primary source of all our knowledge of the plants on the Earth.”

Down the hall is one part of herbarium storage, which looks like any university library with its rows upon rows of movable shelves. Solomon spins a wheel on a row containing the family Clusiaceae and reaches for a folder containing a species of Garcinia. He lifts out a card covered with 4-inch-long leathery leaves, knobby roots and a cluster of small round fruits — not the type of thing you would expect to pull out of a file folder, and not an item you’d typically associate with Big Data.

For decades, botanists hoping to mine this data have had to contact MBG or another herbarium and ask for samples in the mail. Plant data mining is usually for taxonomic research — say a botanist in Madagascar finds an interesting plant with an erect stem, dimorphic leaves and green pollen, and wants to know if it’s a distinct species, or perhaps a new hybrid. Simple morphology — what plants look like — is still the primary way to do this. To help this research, MBG archivists circulate about 100,000 samples a year, including loans, gifts and new accessions. But all this handling can damage the samples, and it’s difficult for botanists in developing countries — which contain the greatest areas of biological diversity — to send and receive this type of mail. This is part of MBG’s motivation for cataloging and digitizing data about every specimen in its possession.

So far, the team has gotten through 3.9 million samples, and scanned about 169,000 images. This is a special challenge by itself, because as the Garcinia sample makes clear, not all plants are conveniently flat. “You have things like coconuts, or giant agaves, and other big bulky things, so it does take some specialized equipment,” Freeland says. MBG is working with the Royal Botanic Gardens in the U.K. to build new imaging equipment to scan the samples in super-high resolution, so botanists can zoom and pan and see detail.

Lest these collections seem dusty and antiquated, be apprised that modern plant data also comes in the form of phylogenetic material. Although the vast majority of taxonomic work is still based on morphology, MBG does have a DNA bank, comprising about 11,000 samples specifically preserved for the purposes of DNA extraction. Botanists collect leaf samples, preserve them in silica gel and store them at -20º C, where they are likely to yield better-quality DNA than herbarium material. Legacy collections aren’t tested in this way, because the DNA extraction process requires too much material to be destroyed, Solomon says.

All this work, which touches several different projects at MBG, will help botanists in other countries study native plants and help natural historians understand plant uses and distribution throughout four centuries.

The garden is one of six institutions involved in the Encyclopedia of Life project, for instance, which will catalog data about every species on Earth and make it available online, serving as a single access point for studying the world’s biodiversity. Freeland is a self-described open-access evangelist, encouraging other herbaria and museums to share their collections as openly as possible. Ultimately, he hopes researchers will be able to use it to map new connections in the web of life.

“That’s what we want to document, is that ‘aha’ moment, and I’m a firm believer that they are in there. That’s the big data challenge. What’s traditionally been a very human-centric domain, people going out and collecting plants, now we’re throwing algorithms at the problem,” Freeland says. “The more data that are available, the more comprehensive your science can become.”