Bringing Biodiversity Data Online, One Leaf At A Time

Converting millions of pressed plants into a vital digital archive

On Oct. 6, 1846, prisoner of war Friedrich Adolph Wislizenus ventured into the rocky hills near the secluded Mexican town of Cusihuiriachi to collect some plants. He and his westward exploration party had been captured as the Mexican-American War broke out, but the St. Louis-based doctor and naturalist decided to continue his research during his imprisonment. He pulled up a crimson wildflower, henceforth named Heuchera sanguinea, and hiked back to the village.

He arranged the spindly plant so that once it was dried and mounted on paper, both sides of its leaves would be visible to future botanists, and then he pressed it between sheets of newsprint. After he was freed in the spring of 1847, he carried the dried plant and many others back to St. Louis, presenting it to his friend and fellow doctor Georg Engelmann. And this is where the crisp, rust-colored sample remains — a piece of botanical history, but also an important piece of data, one of almost 6.3 million specimens stored in a forest of manila folders at the Missouri Botanical Garden.

Click to launch the photo gallery

Friedrich Adolph Wislizenus

Wikipedia

"Biodiversity science isn't often looked at as data-intensive, or as Big Data. But what we have is base data for all other science," says Chris Freeland, director of the Center for Biodiversity Informatics at MBG. "These little plants, these pieces of information, are part of the knowledge ecosphere that we have about life on Earth, and we need to know everything we can about all of them."

Aiming to share their specimens with the world, a team of archivists and botanists are painstakingly photographing every single one of them and putting them online for anyone to access.

The Missouri Botanical Garden is one of the world's largest repositories of data about plants, data that exists in several forms. It owns rare first-edition books, including medicinal plant manuals dating to the time of Gutenberg; thousands of living plants, which grow for public enjoyment on 79 acres in the center of urban St. Louis; and the 6 million-plus dried specimens — the unseen garden — which are used for studying the morphology, distribution and use of plant species worldwide. Along with samples collected by Wislizenus and his peers, the garden has collections obtained by Charles Darwin, James Cook and nearly every naturalist to travel westward in the footsteps of Lewis and Clark. All of this data, from books to scanned images to specimen labels, will be online within a few years, Freeland says. The garden is even building API tools so others can write apps to mine it all.

Jim Solomon is the curator of MBG's herbarium, the name for a collection of permanently preserved plant parts. Towers of 12x16-inch manila folders, each holding specimens like H. sanguinea, crowd every surface of his office — some are tied together with red string, while others lay open with multiple plant pieces poking through. The prodigious piles are a fraction of MoBot's overall collection, he says. Among the 37 total herbaria worldwide, there are probably 400 million such folders, he estimates. "Out of the total number of individual plants that have lived on the Earth in the past couple of centuries, that is vanishingly small," he says. "Yet that is the primary source of all our knowledge of the plants on the Earth."

Down the hall is one part of herbarium storage, which looks like any university library with its rows upon rows of movable shelves. Solomon spins a wheel on a row containing the family Clusiaceae and reaches for a folder containing a species of Garcinia. He lifts out a card covered with 4-inch-long leathery leaves, knobby roots and a cluster of small round fruits — not the type of thing you would expect to pull out of a file folder, and not an item you'd typically associate with Big Data.

Organizing the Plants

Curatorial assistant Ron Liesner sorts a collection of dried plants. Liesner focuses on Mesoamerican and South American plants.Courtesy Missouri Botanical Garden

For decades, botanists hoping to mine this data have had to contact MBG or another herbarium and ask for samples in the mail. Plant data mining is usually for taxonomic research — say a botanist in Madagascar finds an interesting plant with an erect stem, dimorphic leaves and green pollen, and wants to know if it's a distinct species, or perhaps a new hybrid. Simple morphology — what plants look like — is still the primary way to do this. To help this research, MBG archivists circulate about 100,000 samples a year, including loans, gifts and new accessions. But all this handling can damage the samples, and it's difficult for botanists in developing countries — which contain the greatest areas of biological diversity — to send and receive this type of mail. This is part of MBG's motivation for cataloging and digitizing data about every specimen in its possession.

So far, the team has gotten through 3.9 million samples, and scanned about 169,000 images. This is a special challenge by itself, because as the Garcinia sample makes clear, not all plants are conveniently flat. "You have things like coconuts, or giant agaves, and other big bulky things, so it does take some specialized equipment," Freeland says. MBG is working with the Royal Botanic Gardens in the U.K. to build new imaging equipment to scan the samples in super-high resolution, so botanists can zoom and pan and see detail.

Lest these collections seem dusty and antiquated, be apprised that modern plant data also comes in the form of phylogenetic material. Although the vast majority of taxonomic work is still based on morphology, MBG does have a DNA bank, comprising about 11,000 samples specifically preserved for the purposes of DNA extraction. Botanists collect leaf samples, preserve them in silica gel and store them at -20º C, where they are likely to yield better-quality DNA than herbarium material. Legacy collections aren't tested in this way, because the DNA extraction process requires too much material to be destroyed, Solomon says.

All this work, which touches several different projects at MBG, will help botanists in other countries study native plants and help natural historians understand plant uses and distribution throughout four centuries.

The garden is one of six institutions involved in the Encyclopedia of Life project, for instance, which will catalog data about every species on Earth and make it available online, serving as a single access point for studying the world's biodiversity. Freeland is a self-described open-access evangelist, encouraging other herbaria and museums to share their collections as openly as possible. Ultimately, he hopes researchers will be able to use it to map new connections in the web of life.

"That's what we want to document, is that 'aha' moment, and I'm a firm believer that they are in there. That's the big data challenge. What's traditionally been a very human-centric domain, people going out and collecting plants, now we're throwing algorithms at the problem," Freeland says. "The more data that are available, the more comprehensive your science can become."

Dried Plants Are Bits of Data

Friedrich Adolph Wislizenus collected this coralbells sample while a prisoner of war in Mexico Oct. 6, 1846. He brought the sample back to his home of St. Louis, where he gave it to his friend and fellow botanist Georg Engelmann. It is one of more than 8,000 samples collected by pioneers who explored the American frontier in the footsteps of Lewis and Clark. The collection is the first scientific record of the native plants of North America before European settlement.Tropicos/MOBOT

Dried Plants Are Bits of Data

Friedrich Adolph Wislizenus collected this coralbells sample while a prisoner of war in Mexico Oct. 6, 1846. He brought the sample back to his home of St. Louis, where he gave it to his friend and fellow botanist Georg Engelmann. It is one of more than 8,000 samples collected by pioneers who explored the American frontier in the footsteps of Lewis and Clark. The collection is the first scientific record of the native plants of North America before European settlement.Tropicos/MOBOT

Dried Heather

This brilliant red specimen is one of roughly 160,000 images in the Tropicos database, which organizes data on the Missouri Botanical Garden's 6.3 million (and counting) dried plant specimens. Garden archivists are scanning every plant in the collection as part of a digitization effort. The scanned dried plants will also be added to the Biodiversity Heritage Library and the Encyclopedia of Life, which will contain data on every species on the planet. This is called Erica ignita E.G.H. Oliver, and is named for its collector, Edward (Ted) George Hudson Oliver, a South African botanist.Tropicos/MOBOT

Filing the Plants

Archivists at the Missouri Botanical Garden circulate nearly 100,000 plant specimens a year. That includes adding new accessions, accepting and donating gifts, making loans to other herbaria around the world, and sharing specimens with researchers in other countries. Curators sort specimens by type (is it a gymnosperm or a monocot?), then by family, then alphabetically by genus, then by geographic region, and then by species.Rebecca Boyle

Violets in Folders

Jim Solomon examines some violets collected from Spain by the German botanist Wilhelm Becker.Rebecca Boyle

Darwin's Asplenium

MBG's collection includes specimens taken by several notable naturalists from the 18th and 19th centuries. Charles Darwin cut this plant during an expedition to Chile Dec. 30,1834. It grew on rocky terrain at a location called Tres Montes, which Darwin reached after climbing a 2,400-foot mountain, according to his journal, in which he said "the scenery was remarkable." It is called Asplenium dareoides, a plant capable of producing dramatically diverse hybrids — hence its interest to plant evolutionists like Darwin.Christine Siebert/Courtesy Missouri Botanical Garden

Mandrakes in the Garden of Health

The MBG library contains more than 200,000 books, some dating to the era of Gutenberg, including manuals for farmers and apothecaries. "In the 16th century, everybody's health care came from medicinal plants," says herbarium curator Jim Solomon. The foundation of modern botany comes from medicinal plant usage, he notes — apothecaries started keeping notes on what they kept in their supply cabinets, writing about proper gardening and use. This one, for instance, is called "Gart der Gesundheit," or "Garden of Health" in German. This particular version was printed in 1487. The page is open to a wood-cut drawing of a mandrake. Every single page in more than 200,000 volumes — comprising 85 percent of all the literature ever written about plant taxonomy — is being painstakingly scanned and archived online through MBG.Rebecca Boyle