At some point about halfway through the hurly-burly of pulling together our special issue on what I'd taken to calling The Data Age, senior associate editor Ryan Bradley noticed that Stephen Wolfram had created a timeline of significant milestones in the historical march of data. We thought it would be an excellent piece of contextual glue to apply to our analysis of the burgeoning power of data, well wielded, to both illuminate and influence our world. Fortunately, Wolfram agreed, and the timeline ran as connective tissue along the bottom of our magazine pages. Wolfram was at that time about to host his second annual Wolfram Data Summit in Washington, D.C., a gathering of database curators and purveyors and open-source savants to discuss how best to cultivate and process and liberate our geometrically expanding data bounty. Wolfram's data-processing answer engine, Wolfram Alpha, is an outrageously ambitious and optimistic enterprise. It combines the algorithmic might of his own Mathematica software with brute-strength data-curation efforts to answer questions users may not even be aware they're asking. I figured he would be an interesting person to chat with about data.
The phone call lasted about an hour and 45 minutes, from me thanking him for allowing us to exploit his curatorial skills and asking whether he agreed with our proposition about the burgeoning power of Big Data all the way through to Wolfram's description of his algorithmic pursuit of the origin and laws of our universe, basically a mash note to the brilliant irreducible complexity of nature (as opposed to the iterative plodding simplicity of human engineering). It was fun. I've edited the full transcript for clarity and tightened it just a bit for your benefit, but otherwise I think it's worth reading in full. Enjoy the trip.
MARK JANNOT: First off, thank you for letting us use your timeline in our Data Issue. We've employed it as a helpful tool to drive readers through the entire feature well and package this issue together in a nice way, so it's really great.
STEPHEN WOLFRAM: I'd always thought systematic data is important in the progress of civilization, but I have to say that as we put together the poster, I really realized how different steps in the creation of important directions in civilization were made possible because there was systematic data about this or that, or people could count on data being available. Of course, there were some bad things that happened in the course of history as a result of large chunks of data being collected, too, but I like to look at the positive progress.
Did you choose to leave out the negatives in the timeline?
I don't know the history tremendously well, but once you can do censuses of people, you can decide, We don't like who they are and we can figure out where they are based on the census.
It's true of anything, I guess—anything that's going to lead to progress will open us up to abuses and negative results. One of the pieces we're publishing is about Albert-László Barabási and the implications of his current thinking about hubs and nodes, and learning which are the action hubs and nodes. Once you know that, you can use that knowledge to control that system, and of course while there is much in the way of positive outcomes of such a thing to anticipate, you can easily imagine all the potential negative consequences too.
I've been involved in two sides of this. One is data as content, and the other is the science of large collections of things that get represented by data. On both sides of that, I've seen all these things: people worrying about understanding networks for how terrorist organizations are set up, understanding networks for figuring out how you minimize your marketing budget by only targeting the important people and not the followers, all those kinds of things. Just because I'm curious, what all are you covering in your data issue?
So, I'm first going to toss this out, the notion that all of our exponential growth curves in data gathering, storage and processing ability have delivered us to a real paradigm-shift moment in terms of how data can both help us to understand our world and to change it. Do you agree with that? And how does that dovetail with your own work with data and computation?
There are several different branches here. Let's start with, when you say data, what are the sources of data in the world today? One source of data is people compiling data—census data, data on properties of chemicals. This is largely human-compiled data. What has happened today is that there are very large data repositories in lots of different areas. Many of them were started 30 years ago, and they've been just gradually building up, building up. Those data repositories were made possible originally by the existence of at first mainframe and then later generations of computers. That's what got lots of people really launched on being able to create those data repositories. So source number-one for data is the human aggregation of data. Another source of data, which is just coming online in a big way, is sensor data. At this point, there's some kind of public sensor data, whether it's seismometers from around the world or whether it's traffic-flow sensors, lots of much more private sensor-based data that people use for their own purposes. That's leading to a huge torrent of quite homogeneous data. It's "the level of this river as a function of time, every minute for the past however long."
Homogeneous data in each set of each sensor data, but not homogeneous necessarily across sensor data sets in a way that would make it cross-computational?
That's correct, but each individual data point, there's no extra effort to collect that. And the third source of data is essentially data generated from the computational universe—that is, things that we by algorithms can go out and figure out. It may take a lot of effort to run those algorithms and figure it out, but once it's figured out, we can store it and use it as data. We've made these huge tables of properties of mathematical functions and such things where each entry takes a lot of effort to produce, but once produced it's just something you can use. I see those as being, at the practical level, the sources of data. Now the question is, what do people do with this data? That's where I think, raw data on its own—experts can make use of raw data on its own; other people typically don't want the actual raw, raw data. They typically want to answer some question based on this data, and for that you really need not just data, but knowledge, and the ability to answer questions from that knowledge.
Sometimes you just have to try it, and then you will know.
How many tries will it take for a robot to do a kickflip?
Wolfram Alpha says:
Let's see it happen!