The final graphic

For the July issue of Popular Science, we—the Office for Creative Research—created a data visualization celebrating NASA’s long history of aerospace innovation. Since 1959, NASA has published a document called “Astronautics & Aeronautics Chronology” nearly every year, compiling news coverage of science, technology, and policy at the agency. In these compilations, NASA is reporting its own history. What kinds of stories do these documents hold? How has their language changed over the last six decades? To explore these questions, we created “The Whole Brilliant Enterprise,” a text-based visualization drawn from—by our count—4,861,706 words of NASA history.

The first step was to dig through the NASA chronologies by hand. We discovered that while the reports were an extremely descriptive history of aerospace, they lacked a hierarchy—they were simply straightforward timelines recounting events. A story about the hiring of a new NASA employee might appear alongside a story of a shuttle launch, representing chronological order but not relative importance. That mixed-up quality makes the documents wonderful to skim, but difficult to visualize.

To address the hierarchy issue, we turned to the archives of The New York Times, seeking out NASA-related headlines and articles. We took the articles’ placement in the paper of record—was it front-page news or did the story appear at the back of a section?—as a proxy for cultural impact. Then, we mapped that importance rating back onto the NASA archives, and used it to pull out the text of just the most consequential stories to act as the foundation of the visualization. It was in compiling these results that we realized that the piece should not be a rigid timeline of key NASA events, but instead a rolling impression of the agency’s eras, created by displaying some of the more popular and important terms within the articles.

Once we had the structure in place, the challenge became finding the balance between a term’s chronological location and the type size that would represent its place in the “cultural impact” hierarchy. We also had to space the individual terms evenly along a curved path. It took many iterations of the code that generated the graphic to strike that balance, but eventually we settled on a process that produced an image with the character that we had originally envisioned.

We followed a circuitous path to generate the graphic—the extent of which is evident in our sketches [below]—but we felt it was an appropriate process given the breadth of the archive. The value of our explorations is—like the histories themselves—more striking when viewed in hindsight.

A Small Gallery of Our Sketches and In-Progress Images

Counting the number of NASA-related New York Times stories

We used articles in the New York Times to establish a hierarchy within the stream of stories that NASA compiles in its (almost) annual history reports. Dots here each represent a story, and are arranged by quarter. The most Times stories were published around the July 1969 moon landing.

A selection of NASA-related New York Times stories, plotted by their length and location in the paper

Each dot here represents a NASA-related New York Times story. Page number of the story runs along the x-axis, and the y-axis is the story length in words. Bigger dots are stories that appear in a month that contained lots of other NASA stories—presumably meaning it was among was a flurry of noteworthy events. Lines connect consecutive stories in time. This view allowed us to determine whether our page-ranking algorithm would work to establish a hierarchy of stories in the NASA documents: If the same terms appeared in the NASA stories as in the most important Times stories, those NASA stories are likely more significant.

A quick visualization of the interconnectedness of select New York Times story abstracts on different NASA topics

The white rays around the outside of the ring represent a selection of NASA-related Times stories. Longer stories create longer white radial streaks. When a single term appears in two stories, those stories are connected by an arc. The colors are randomly assigned.

Identifying the most important terms and beginning to sort them by topic

At one point we used a word-cloud approach for our own internal examination of the text. The words are pulled from the NASA reports, and loosely arranged by time on the x-axis. Larger words have a higher importance index, based on our analysis of New York Times articles. They’re colored by category.

A process shot, as we calculate allowable text heights along the curves of the graphic

Before we could fit text along a the curved paths of the graphic, we needed to calculate two parameters: the curvature of the line at each point (so we can lay down text that follows the curves smoothly) and the height between one curve and the next (which tells us how big the text needs to be to fill the space). This image is a screen shot of our algorithm in progress.

A study of our path-generation algorithm for the flare of “-ing” words running across the background

One of the more fanciful elements in the graphic is the streaming white “-ing” words that appear in the background, evoking the flames that propel the spacecraft forward and giving a sense of flow and direction to the piece. This was the output of an early version of our program for generating the paths that we would eventually flow the “-ing” words along.

Distributing the curves that will corral the text for on each topic in the final graphic

The height of the curve is based on the number of stories in the NASA archive in each of the categories we chose to feature. Here, we’re testing how the streams would look for a handful of different category options.

Finding the perpendicular lines at the curves’ inflection points, for running text along the curves later

With the final category streams in place, we then had to assess the shapes of those curves so we could flow the final text along them.