Q&A: Stephen Wolfram on the Power and Challenge of Big Data

Popular Science editor Mark Jannot talks to the data wizard about big data, human understanding, and the origin of the universe
Stephen Faust

At some point about halfway through the hurly-burly of pulling together our special issue on what I’d taken to calling The Data Age, senior associate editor Ryan Bradley noticed that Stephen Wolfram had created a timeline of significant milestones in the historical march of data. We thought it would be an excellent piece of contextual glue to apply to our analysis of the burgeoning power of data, well wielded, to both illuminate and influence our world. Fortunately, Wolfram agreed, and the timeline ran as connective tissue along the bottom of our magazine pages. Wolfram was at that time about to host his second annual Wolfram Data Summit in Washington, D.C., a gathering of database curators and purveyors and open-source savants to discuss how best to cultivate and process and liberate our geometrically expanding data bounty. Wolfram’s data-processing answer engine, Wolfram Alpha, is an outrageously ambitious and optimistic enterprise. It combines the algorithmic might of his own Mathematica software with brute-strength data-curation efforts to answer questions users may not even be aware they’re asking. I figured he would be an interesting person to chat with about data.

The phone call lasted about an hour and 45 minutes, from me thanking him for allowing us to exploit his curatorial skills and asking whether he agreed with our proposition about the burgeoning power of Big Data all the way through to Wolfram’s description of his algorithmic pursuit of the origin and laws of our universe, basically a mash note to the brilliant irreducible complexity of nature (as opposed to the iterative plodding simplicity of human engineering). It was fun. I’ve edited the full transcript for clarity and tightened it just a bit for your benefit, but otherwise I think it’s worth reading in full. Enjoy the trip.

MARK JANNOT: First off, thank you for letting us use your timeline in our Data Issue. We’ve employed it as a helpful tool to drive readers through the entire feature well and package this issue together in a nice way, so it’s really great.

STEPHEN WOLFRAM: I’d always thought systematic data is important in the progress of civilization, but I have to say that as we put together the poster, I really realized how different steps in the creation of important directions in civilization were made possible because there was systematic data about this or that, or people could count on data being available. Of course, there were some bad things that happened in the course of history as a result of large chunks of data being collected, too, but I like to look at the positive progress.

Did you choose to leave out the negatives in the timeline?

I don’t know the history tremendously well, but once you can do censuses of people, you can decide, We don’t like who they are and we can figure out where they are based on the census.

It’s true of anything, I guess—anything that’s going to lead to progress will open us up to abuses and negative results. One of the pieces we’re publishing is about Albert-László Barabási and the implications of his current thinking about hubs and nodes, and learning which are the action hubs and nodes. Once you know that, you can use that knowledge to control that system, and of course while there is much in the way of positive outcomes of such a thing to anticipate, you can easily imagine all the potential negative consequences too.

I’ve been involved in two sides of this. One is data as content, and the other is the science of large collections of things that get represented by data. On both sides of that, I’ve seen all these things: people worrying about understanding networks for how terrorist organizations are set up, understanding networks for figuring out how you minimize your marketing budget by only targeting the important people and not the followers, all those kinds of things. Just because I’m curious, what all are you covering in your data issue?

So, I’m first going to toss this out, the notion that all of our exponential growth curves in data gathering, storage and processing ability have delivered us to a real paradigm-shift moment in terms of how data can both help us to understand our world and to change it. Do you agree with that? And how does that dovetail with your own work with data and computation?

There are several different branches here. Let’s start with, when you say data, what are the sources of data in the world today? One source of data is people compiling data—census data, data on properties of chemicals. This is largely human-compiled data. What has happened today is that there are very large data repositories in lots of different areas. Many of them were started 30 years ago, and they’ve been just gradually building up, building up. Those data repositories were made possible originally by the existence of at first mainframe and then later generations of computers. That’s what got lots of people really launched on being able to create those data repositories. So source number-one for data is the human aggregation of data. Another source of data, which is just coming online in a big way, is sensor data. At this point, there’s some kind of public sensor data, whether it’s seismometers from around the world or whether it’s traffic-flow sensors, lots of much more private sensor-based data that people use for their own purposes. That’s leading to a huge torrent of quite homogeneous data. It’s “the level of this river as a function of time, every minute for the past however long.”

Homogeneous data in each set of each sensor data, but not homogeneous necessarily across sensor data sets in a way that would make it cross-computational?

That’s correct, but each individual data point, there’s no extra effort to collect that. And the third source of data is essentially data generated from the computational universe—that is, things that we by algorithms can go out and figure out. It may take a lot of effort to run those algorithms and figure it out, but once it’s figured out, we can store it and use it as data. We’ve made these huge tables of properties of mathematical functions and such things where each entry takes a lot of effort to produce, but once produced it’s just something you can use. I see those as being, at the practical level, the sources of data. Now the question is, what do people do with this data? That’s where I think, raw data on its own—experts can make use of raw data on its own; other people typically don’t want the actual raw, raw data. They typically want to answer some question based on this data, and for that you really need not just data, but knowledge, and the ability to answer questions from that knowledge.

A couple years ago at TED, Tim Berners-Lee led the audience in a chant of “More raw data now!” so he’s out there trying to create the data Web. And your project in Wolfram Alpha is not to deliver raw data but to deliver the interface that allows for raw data to turn into meaning.

Yes, what we’re trying to do—as far as I’m concerned, the thing that’s great about all this data is that it’s possible to answer lots of questions about the world and to make predictions about things that might happen in the world, and so on. But possible how? Possible for an expert to go in and say, “Well, gosh, we have this data, we know what the weather was on such-and-such a date, we know what the economic conditions at such-and-such place were, so now we can go and figure something out from that.” But the thing that I’m interested in is, Can one just walk up to a computer and basically be able to say, “OK, answer me this question.”

So it’s not necessary to, when you have a question, then have to devise the extraordinarily complicated means by which the raw data, you know—finding the data and then crunching the data. It’s crunched automatically and magically.

Right. Say, for example, you’re trying to find where’s the moon going to be right now from where I am on planet Earth. Now, there’s underlying data. It’s the orbital elements for the lunar orbit, it’s the lat/long, it’s all sorts of time-zone data, things like that. But there’s a certain distance to applying those orbital elements and figuring out, OK, with these coordinate systems we can compute this thing, and so on. That’s one type of thing. Another type of thing, it might be, well, you know the value of something and you know some economic indicator and you know the population of some country, well then, you can derive the per-capita thing and you can compare that with every other country of the world. In the second case, it’s almost a shallower, almost piece of arithmetic, and in the first case it involves some physics to actually go in and compute from the raw data and answer the question you want to answer. It’d be one thing to say to people, Here’s a table of orbital elements of artificial satellites. OK, that’s nice, if you happen to have taken some courses in celestial mechanics, then you can go from that to actually figure out where’s the International Space Station right now. That’s sort of an extra computational step beyond the data. So data is the underlying layer, as far as I’m concerned, from which we can know things about the world. Really, two big things are needed; there’s data and there’s algorithms. So in the example of things about artificial satellites, the data is: What are the elements of the orbital satellite? And then there’s an algorithm, which is: How does the satellite actually propagate in the gravitational field and get me an actual answer to the question I might ask about “When will a satellite next rise over the horizon for me”?

For us, there’s two other issues in what we’ve tried to understand about how one uses data. One is, so this data is out there, but how do I actually access, how do I ask for a piece of data? The next issue is, there might be data on the size of some particular city, but what city am I talking about? Because there might be a slang name for the city, there might be a standard official government name for the city, and so on. How do I actually refer to things? And that’s not something that’s part of the raw data. It’s not something a sensor is going to pick up for you. It’s this weird thing that connects the data to our human experience and our way of referring to things.

The machine intelligence, the intermediary between us and our data, needs to understand our language in order to help us access the data.

One has to feed this knowledge to humans in a way that they can assimilate. And that requires a certain understanding of the human way of absorbing information. It needs to learn from us. The data on its own, the sensors could’ve picked up everything they could’ve picked up, but it’s not something we can connect to unless our systems can understand what we’re trying to ask for. And the other point about all this data is, often you can generate lots of results, lots of answers, and then the question is, how do you present these things? And that’s another thing that’s very human-centric. One could say, Well, here’s a result; it’s this giant network with a million nodes in it all in this micro-spaghetti, but no human will be able to assimilate that in any meaningful way. One has to feed this knowledge to humans in a way that they can assimilate. And that requires a certain understanding of the human way of absorbing information. That’s another thing that’s a meta-layer on top of raw data is, so how should you present it to humans? In Wolfram Alpha, one of the things we found interesting is, we’ve done experiments where we say, let’s just give people the answer—one line, the answer. People ask a question like “What’s the GDP of Italy?” There’s an answer. It’s such-and-such a number of trillion dollars. Well, it turns out people actually don’t just want the answer. They’re much more interested in the plot—what’s the history of that number, what is the ranking of that number among countries in the world, what’s the conversion of that number to local currency? And people quickly assimilate the ambient context in something that they need, in order to absorb the information.

But how do you define the limits of the ambient context? You could keep going in that direction so broadly that it becomes impossible to plow our way through those thickets all over again.

Right. In Wolfram Alpha, for example, we’ve gone to lots of effort to try and say, well, what other things do you present first? Then you press the “more” button and you see more, and you press it again and again and again, and you follow the links, you keep going. So it’s what’s the most important thing, what’s the next-most-important thing. Those are all the sort of meta-information that’s part of this delivery of computational knowledge that is beyond just the underlying raw data.

Are those sorts of decisions about how to present the information—what the focal length should be—are those decisions that need to be made by humans, or can they be automated to a certain extent?

We’ve managed to automate them a bit. There are certain criteria, like for example, when you make a graph. We have tried to abstract from experience with humans, what are the design principles for making a good-looking, easy-to-understand plot? And then we can apply those design principles automatically. For example, when you lay out a network, showing this is connected to that is connected to that, we’ve pretty much completely automated the process of laying those things out, which is something which—I remember years back, I used to get humans to try and do those layouts for me, and it’s rather difficult to do a good layout if you know this is connected to that is connected to that. I actually remember doing a bunch of big ones myself at one point where you literally get pieces of string and you’re laying things out and putting them in different places, and sometimes it looks good and you can understand it and sometimes it doesn’t. That’s an example of something where we’ve completely nailed that one. We completely automate that process.

What about the process of figuring out what makes a good graph in the first place?

Well, one could divine principles that say what will make this good, what does not make this good. One could go and connect those things to some theory of human cognition if one wanted to. What I’ve ever done has been typically more pragmatic than that. Taking a little bit of information from what one knows about human cognition but then just saying, these are the design rules; now figure out how to create an algorithm that will produce visual output that satisfies these design rules. At that level, you can ask a question like, when you type in some quantity, like 235 inches. The question is what you want to know, based on 235 inches. You probably want that converted to feet, you might want that converted to meters, you might not particularly care about having that converted to microns, for instance. There’s the question of how do you figure out what humans would like to see that converted to. In that particular case, the way we figured that out for lots of different kinds of units is we looked at the Web and we looked at scholarly articles and things like this, and we just crunched the whole thing, and we looked at when people talk about inches, meters, centimeters, microns, nanometers, things like that, do they say 10 million nanometers? Probably not. Do they say 230 nanometers? Absolutely. Turns out there’s a distribution you can actually deduce from looking at that data set, which is the Web, about how people talk about things like units.

I would think in a case like 235 inches, how likely is it that people would just type that into a computation engine like yours? I would think it’s more natural to say “235 inches in meters” or something like that. It’s shocking that they don’t go that extra step further.

Not the case. They don’t. Because what happens, if you’re thinking about it—so you figure out something is 235 inches. Then you say, What really is that? So you type it in, and we’ll tell you it’s some number in feet, meters, centimeters or yards, and they say, That’s what I needed to know.

So putting it in meters is beyond their understanding of what they want to know.

For example, let’s say they were reading an article in Popular Science, and you had something that said such-and-such a thing was some size, some number of liters or something. And the person says, Well, what is that? Then they type it in, click a link, and it says that’s actually also 22.3 gallons or something. Well, maybe we also generate these comparisons something-something-times the actual size of a Coke bottle. And now they say, Oh, now I understand, on I go, I can do something with that. This is relating the raw data to the kind of map of knowledge that we humans actually have. The real thing is how you go from real data to things that are useful for humans, useful things that humans can assimilate. One piece of that is you have to actually compute things from the raw data. Another thing is you have to deliver the result of those computations that is sort of well matched to humans.

It seems to me that the range or the number of problems involved in making data comprehensible to humans is practically infinite. It seems just outrageously laborious, this task of figuring out the ways that people need to have things translated in order to make them understandable.

You have to think about how many kinds of things are there in the world. How many different domains of knowledge are there? For example, somebody like me who deals with data, I can tell you random facts like there are six million streets in the world, there are 30,000 items in a typical grocery store, I could read off tons of these things, but there are a finite number of these kinds of domains. In each domain, there’s a large collection of similar things in that domain. And the number of domains, it’s in the thousands. In Wolfram Alpha, we’ve crunched through a few thousand of these domains. Each time we run into a domain, there are new and different issues. Pick a domain—plants, for instance. There are all kinds of issues about plants: what’s the species, how tall does it get, how does it get harvested. Plants grow at certain critical temperatures, they start germinating. How does that relate to, we have all these things around climate, how does that all connect up? When you get some new domain like plants, you say, That’s quite similar to this and this and this other domain, but it has all its own extra twiddles in it that you have to deal with. Each one of these domains, it’s a fair amount of effort, but it’s finite, and there’s a finite number of these domains. In the Wolfram Alpha project, there were two basic observations that made me decide that this project was not just abjectly impossible.

That must have been a glorious day when you figured that out.

I started the project before I was absolutely sure of the conclusion. One of those was: there’s a lot of data in the world, but it’s finite. It’s not the case that there’s just so much out there. It’s like the Web, for example. The Web is very big. But how big? Well, there’s probably about 10 billion meaningful pages on the Web. It’s big, but it’s a finite number. You can say, well, how much stuff is there in all the reference books? How big is the biggest reference library? What kind of stuff is in these reference libraries? Well, it’s big, but it’s something where it’s trillions of elements, maybe quadrillions of elements, but it’s something where you can name the numbers. We can be quantitative about how big it is; and it’s big, but it’s not infinitely big. Sometimes people say in order to do anything like this, you just have to scale infinitely. Well, you don’t have to scale infinitely. The world is a finite place. Big, but finite.

Most people think that the scale of the numbers we’re talking about is so huge that it might as well be infinite. That it seems so daunting.

I’ve been lucky in these kinds of things that I’ve developed some sort of absurd confidence that just because it’s daunting doesn’t mean I can’t do it. Like, how many named laws are there in the world, things like Ampere’s law or the universal law of gravitation? Between 5,000 and 10,000. What does that mean? Well, maybe it’s half a million lines of Mathematica code to implement all of those. It’s big, but it’s finite.

So that’s the first insight.

There’s probably about 10 billion meaningful pages on the Web.Right. The second one was, I had kind of imagined this notion of machines being able to compute answers to questions, that that was very much an intelligence activity. That was something, you see the science-fiction movies of the ’50s and ’60s where people are walking up to computers and talking to them, and the computers are giving answers. I had always imagined that what would need to be inside that computer would be a general artificial intelligence. That the only way that one would be able to do a decent job of answering questions is to imitate what humans do when they think and figure out answers to questions. As a result of a lot of basic science that I did, I kind of came to the conclusion—how do sophisticated things happen in the universe? In physics, in other places. How do things that look like they’re intelligence look? How do sophisticated things happen in nature? How does the sophistication of what happens in nature compare with the sophistication of what we’re able to do with our minds? What I came to realize—it’s part of the thing I call the principle of computational equivalence—is that actually there is no sharp distinction between the stuff that happens in nature, or happens in the computational universe—the possible programs and things—and the stuff that we as humans with our minds are able to do. If somebody says, Well, we need this magic idea to make artificial intelligence, and that’s some idea that isn’t present in ordinary computation, it’s some extra idea, what I came to realize is that there really isn’t that kind of bright-line distinction between the intelligent and the merely computational. So that made me realize that you didn’t need to build a whole AI in order to be able to answer the same kinds of questions that people would expect expert humans to answer. It surprises me a bit that an issue that philosophical has a consequence as practical as a site that actually—create 15 million lines of code, run billions of servers, that sort of thing. And I find it kind of charming that a philosophical issue of what it means to be intelligent should actually have such a practical consequence, but at least for me it definitely did in the sense that what I realized is you don’t have to invent artificial intelligence in order to be able to succeed at building a system that can do computational knowledge. And more than that, I even realized later on after we got into it, that actually we humans have the exact wrong way of thinking about artificial intelligence, because that gets you into the mode of thinking, “Let’s reason about things the way humans reason about things.” Let’s say you’re trying to solve some physics problem. You reason about the physics problem; like you say, This mechanical object pushes this mechanical object, and then it does this and then it does that and so on, and you’ve got some whole logical chain of reasoning. That turns out to be an incredibly inefficient way to figure out what the system will actually do. A much more efficient way is just to set up the equations that people invented 150 years ago to represent those things and just blast through to the answer using the best modern scientific methods.

So your AI, which isn’t strictly AI, should figure out a way to do it on its own that’s better than the screwed-up, roundabout way humans would figure it out, or program it to figure it out if they were required to do so?

I think the main point is, Can you compute the answer? Well, what does it mean to figure out how to compute the answer? Anytime there is an algorithm of any degree of sophistication, it’s doing a certain amount of figuring out how to, as well as just going and getting the answer. There really isn’t a distinction between the figuring out how to and the getting to the answer. Just to change direction a little bit, I think one of the things that you were saying before, which is one of the things I wonder about, is we’re in a world now where we can readily compute lots of things about the world, we can figure out a lot of things, we can predict a certain number of things, we can go up and just sort of ask a question about something and often be able to figure out answers, predictions and so on. How should we think about what that will mean for the future of how people do things? I think one of the themes has been: the world goes from a point where people just guess how to do things to a time where people actually compute what they should do in a more precise fashion. We see that happening in more and more places, and sometimes it happens inside the devices we use; they automatically do the computation and just automatically focus the camera or whatever it is. A GPS that figures out where to go. Sometimes they’re simply telling us something. I think what will happen in the not-very-distant future is the much more preemptive delivery of knowledge. Right now, a lot of the kind of knowledge we get we have to ask for. You walk up to Wolfram Alpha, you ask it a question. It’s not something where you’re preemptively being told something that you might find interesting. I think increasingly, things will be set up so that one is preemptively being told something one might find interesting. This relates to a whole other world of data which I think will emerge in the next few years: the personal analytics world of data. Record everything about yourself, and it concludes. People like me, because I’ve been interested in data, I’ve recorded every keystroke I’ve typed in the last 20 years. I’ve recorded tons of other stuff—I don’t look at those as often as I might, but I was about to do a big effort to just go and look at all my data for the past decade or so and try and see what I can learn about myself from it. That’s a good example of where just having the raw data is amusing, but without being able to compute from it and knowing how to present it, it’s not immediately useful. I think increasingly, based on what has been recorded about oneself, there will be an ability to preemptively deliver knowledge that is relevant, to compute knowledge that is useful.

Ultimately, the most useful insights that can come out of this sort of thing are insights about what the right questions are to ask. You’ll want to learn things about yourself that you wouldn’t even have considered asking in the first place. It will tell you what questions to ask based on what comes out of it.

Which is similar to the issue when you get a result, something like the example we were talking about earlier, 235 inches—what do want to know based on that? Can you have algorithms and heuristics that can figure out what’s likely to be relevant? One of the things that’s strange, for me at least, is the extent in Wolfram Alpha’s form to which we can make predictions about things based on knowledge, data, whatever, yet in a lot of science that I’ve done, one of the upshots to that science is there’s a limit to what can actually be predicted in the world. What it comes down to is that there are these processes that are computationally irreducible. In other words, when you look at a system doing what it does, it’s going through some series of steps to produce the behavior it produces. The issue is, Can we jump ahead and compute what the system is going to do more efficiently than the system itself does it? One of the great achievements of mathematical-type sciences tends to be: let’s just get a formula for the answer. And what does that mean? Well, that means we don’t have to follow all the steps the system goes through. We just plug a number into the formula, and immediately we get an answer about what the system will do. Computational irreducibility is what happens in a surprising range of systems when it doesn’t allow you to do that; it’s irreducible. In order to figure out what the system will do, you effectively have to go through the same series of computational steps that the system itself goes through. There’s no shortcut to getting to the answer. I think in building technology, lots of technologies are specifically set up to avoid having to make things computationally irreducible. A machine has some simple motion where you can readily predict that after three seconds it will have returned to its initial state. So this whole question of what’s predictable in the world based on the data that we have, it’s all wound up with this computational reducibility versus computational irreducibility. There’s certain kinds of things that we can expect to predict; there’s certain types of things where we can’t expect to predict them. Often we set up our technology to be stuff that we can predict. Nature doesn’t necessarily set itself up the same way, so it ends up with things like the weather, which may be quite hard to predict. Increasingly I suspect that technology will end up being harder and harder to predict, because it’s an inevitable feature of technology being able to be more efficient that it runs into this computational irreducibility and gets to be harder to predict. The challenge for us is to use all this data, all this knowledge, to predict what can be predicted and to get as far as we can with things where we just have to compute in order to work out what will happen.

Can we set it up so that the speed through which we can run it through its courses is significantly faster anyway than it would happen in real life?

Often. Often, but not always. I expect that lots of things that show up in biomedicine will have this computational irreducibility issue when we understand all these protein interactions and all sorts of details about how do we go from the genome to the actual biomedical, clinical kinds of phenomena. There’ll be lots of computational irreducibility there, but chances are that we’ll be able to compute things in big enough chunks that as a practical matter, by running enough computations, we’ll be able to compute that if you apply this drug in this way then these things will happen, and so on. When it comes to a more extreme level, a case that I’ve thought lots about is the whole universe, and to what extent is computational irreducibility an issue in understanding the whole universe. One question is, how much data do you need to specify our universe? Is it the case that with an algorithm that’s a few lines of computer code long, if you just run that for long enough, can you get a whole universe? How big does that underlying seed need to be in order to get our whole universe? And I think we don’t know how big that underlying seed needs to be. Let’s say we have the underlying data that completely specifies our universe, but then we have to actually go from that underlying data, that underlying algorithm, to the actual behavior of the universe. And one of the points is that this computational-irreducibility phenomenon implies that that’s irreducibly difficult to do.

Would you have to run it for 13 billion years?

Well, I mean, OK, so in a first approximation, yes. The good news is that there are inevitably pockets of computational reducibility, and that’s our best hope for being able to match up what we’ve already figured out in physics and so on with what the predictions of a particular model are. The universe, it’s sort of obvious that there are pockets of reducibility, because there’s lots of order that we can perceive in the universe. It’s an inevitable feature. It’s just one of these self-referential facts.

Computational irreducibility is like prime numbers in a sense, right? So as long as it has pockets of reducibility, it is not the fundamentally irreducible thing? It’s not the universe that’s computationally irreducible, it’s . . .

It’s the processes that go on. OK, so what is the universe? Is the universe the underlying code from which you can generate the universe? Or is it these dynamic processes that are going on inside the universe today? Or is it just one slice of those dynamic processes? This is the universe as it is today, whatever that means. What computational irreducibility talks about is how much information—if you want to predict what the universe is going to do, if you want to predict some aspect of what the universe is going to do, then you have to go from that underlying rule. You actually have to run it and see what the universe does. So, for instance, one of the types of things is, you might say, Is warp drive possible? And you might say, well, gosh, if you have the underlying theory of the universe, you should be able to answer whether warp drive is possible, but probably it isn’t easy to answer. Probably that will be one of these questions for which it’s effectively undecidable, because what you’ll be reduced to from a mathematical point of view is to say, Does there exist some configuration of material which has this property and that property and that property given these underlying rules for how things can be set up? And that can be an arbitrarily difficult question to answer. And that’s an example of what it means for there to be computational irreducibility. The thing with computational irreducibility is, what it tells you is that in order to find the outcome of some process, you have to follow through some number of steps. And that you can’t always arbitrarily reduce the amount of computational effort that’s needed.

There are no shortcuts.

Right. One feature of that is if you ask a question like, Can such-and-such a thing ever happen even after arbitrarily long times?, that’s a questions that, if there is computational irreducibility, you may not be able to answer in a finite way. If there was computational reducibility, then the fact that one’s asking about arbitrarily long times shouldn’t scare one, because even a thing that takes an arbitrarily long time one can reduce down to something that only takes some given, finite time. But if it’s the case that there’s computational irreducibility, then you can’t expect to always do that reduction. If you’re asking a question about what happens after arbitrarily long, it actually takes you arbitrarily long to answer that, and that’s the origin of the phenomenon of undecidability that shows up in mathematics and Gödel’s theorem and so on, and it’s something which when applied to physics leads to this consequence that even if you know the underlying theory, you might not be able to work out what is technologically possible in that universe.

I think that most people would assume that if you know the underlying theory, you know all of the rules that govern the universe. And what you’re saying is that that is not necessarily true, and to actually know what the rules are, you have to run the universe.

Evolution is actually closer to technology than one might thinkAnd that’s the same fallacy, basically, as when people portray robots that act according to logic, they always portray them—in early science fiction, the fact that the robot had underlying rules meant that its behavior was in some way fundamentally simplistic. That’s sort of the same fallacy, that if you know the underlying rules and the underlying rules are simple, then, gosh, you must be able to tell that, because there can’t be an irreducible distance between the underlying rules and the actual overall behavior. In my efforts in basic science, one of the number-one observations was, if you look in a computational universe of possible programs, an awful lot have this property that even though the program is simple, the behavior is immensely complicated. When we do technology and when we create programs, most of the time we’re trying to avoid the programs where the behavior is arbitrarily complicated. Those are the programs that work in some way that we can’t possibly understand and it’s full of bugs. We tend to aim in our current technology for things where the behavior is simple enough that we can readily see what its consequences will be. It turns out, I think, that one of the big things that will happen in technology—we can already see happening in the coming decades—is more and more technology will be found by searching the computational universe of possible programs, possible algorithms, possible structures, whatever, and we will be able to know that it performs some function for us, but if we look at how it does it, it will look very, very complicated and will not be something that we can readily predict. For example, when we build programs now in Mathematica and Wolfram Alpha, lots of those algorithms are found by algorithm discovery, where basically we’re searching a billion different possible algorithms of a particular kind and finding the most efficient one that achieves some particular objective. When you look at that algorithm, you say, What is this doing? Sometimes we can understand it and we say, gosh, that’s really quite clever of it. And sometimes you say, gosh, I can’t be bothered to figure this out; this is way too complicated to figure out what it’s actually doing, but yet we can see that it’s doing the thing that is useful to us.

You can’t figure out how it’s doing it, but you know what it’s doing.

You can see it’s twiddling these bits in this way and, gosh, they always line up in this way at the end. And one can automatically prove that some particular property will always be the case, and one would never, as a human, trying to write the code, one would never have arrived at this kind of thing. It works in a way that is just utterly alien to a human who’s used to creating code that does its thing iteratively, in a very organized way. When you look at it, it’s like, wow, it’s working, but it’s working in this very complicated way. Now, we see examples of this quite often in nature, in biology, in physics, in other places—natural selection. Actually, evolution is a funny one, because evolution is actually closer to technology than one might think, because evolution has a hard time working on things that are really complicated. It’s much better at, Well, let’s just extend this bone a bit and see what happens, rather than—it’s actually quite rare for evolution to go out and do something that is truly innovative. It’s usually doing things incrementally in a way that’s similar to a lot of engineering that we do.

Is it possible for nature, writ large—for the universe itself—to ever do anything that is anything other than incremental?

Yes! When you look at different things that happen in different physical systems, you can ask, Is it the same case in the way that fluid flow works in this situation versus that situation? They may not be connected. There’s no requirement that the fluid flow be—from one situation to another, that it change its behavior only a little. If it was evolving under natural selection, it probably would be a requirement that it only change its behavior a little. Evolution doesn’t tend to make these random steps that are absolutely dramatic. But the thing about a lot of nature is that there’s no constraint that anybody should be able to understand what these things do. Sometimes we get confused because our efforts at doing science have caused us to concentrate on cases where we can understand what’s going on, and that means we come to think, gosh, it’s all set up to be understandable. But that’s not true at all. It’s just that we selected the cases that we studied to be those ones. And I think that what tends to happen in nature is that there’s a certain amount of incomprehensible stuff that’s going on where we can look at the underlying components, we can understand those, and then there’s some computationally irreducible process that is what happens when those components actually run and do what they do. And in technology, a lot of what we do is we go out into the natural world and we find components that we can harness for technology. We find donkeys that we can ride on or something, or we find liquid crystals that we can use for displays, or we find other things out there in nature that we can harness for some useful human purpose. And one of the things that I know we certainly do a lot of is going out into this computational universe of possible algorithms, in a sense, the computational universe of all possible universes because that’s—our universe operates according to some particular algorithm, but we can readily just go out at a theoretical level and just say, What are all the possible algorithms, what are all the possible universes that exist? And in fact, we can go and look at all those possible algorithms and say, Which ones are doing something that will be useful for some human purpose? And when we find one that’s useful for some human purpose, we can implement it on our computer and maybe one day implement it in some molecule, and then it runs and does something that’s useful for our human purposes. But one of the things that’s important about that methodology—just going out and finding it in the computational universe—is that the thing we find is under no constraint to be comprehensible in its operation to us. When we do engineering, we do things incrementally, and usually it’s the typical party-trick-type thing where you’re shown two objects: one’s an artifact; one’s something that came from nature in some way. A very good heuristic is that the one that looks simpler is the one that humans made. Because most of the technology we build, it’s very repeated motifs of circles and lines and things like this, and it’s built to be comprehensible. I suspect that we’re in the late years of when that will be possible. Increasingly when you look at technological objects, they’ll be things that effectively were found in the computational universe, and they do something really useful. They’re not things that were constructed incrementally in a way that’s readily comprehensible to us, where their operation is readily comprehensible to us.

The current concept of how technologies are created is that they’re built from the ground up. Basically what you’re talking about is plucking something out of the computational universe that we can’t understand and using it to power us forward.

Right. Remember, though, that the components of technology have often been incomprehensible. That is, people can use timber to make things even without understanding how trees manage to be strong. This is just a more extreme version. Typically, people have used materials with certain properties where the properties are fairly easy to explain. This is kind of a more extreme version of that, but now we’re getting these things from this supply of algorithms rather than this supply of material objects.

We could keep going down this rabbit hole forever, I suppose, but let’s pull it back a little bit. This search for the theory of the universe is not theoretical for you. This is something that you want to do or are doing right now. Are you doing it right now?

I’ve taken a break for the past couple of years because I’ve been working on Wolfram Alpha and all the things around it, which is actually very frustrating to me that I had to take this break, but—

I saw the TED talk where you proposed the notion of discovering what the actual initial algorithm of the universe is, and you said it would be within this decade.

That’s my hope.

Seems remarkably optimistic somehow.

No, actually what I hope I said—who knows—is that I don’t know whether our universe has a simple underlying rule. Nobody knows that yet. If it does, though, we should be able to find it. There’s a lot of theoretical technology that you need in order to do the search, find out what you found, all those kinds of things. It’s a lot of work. It’s effectively a big piece of technology development to go and figure out—if you have a theory of this type, how do you see what its consequences are etcetera, etcetera, etcetera. We’ve done a lot of that work. The answer is: if the universe has a simple underlying rule, it’s likely we’ll be able to find it. My point of view is, if it has a very simple underlying rule, which we could find, it’s sort of embarrassing not to have found it within a limited time. Now, it may turn out that the universe doesn’t have a simple underlying rule. It might turn out that there’s a rule for the universe but it’s a million lines of code long, effectively. I think it’s very unlikely.

How simple would you imagine it could be? How many lines of code would you guess, roughly what range?

Here’s a way to think about that. If you start enumerating possible universes, you can—the best representation I’ve found for what I think is a reasonable way to get at this is using networks, transformation rules for networks, so you can represent that as code in Mathematica or something. And each of these transformation rules is probably two, three, four lines long, something like that. But what’s perhaps a better measure is to ask, If you start enumerating possible rules for the universe, how many rules are you looking at before you find one that’s plausible? If you look at the first 10 rules and start enumerating rules—there are probably different way to enumerate them; it doesn’t matter that much which different scheme you use, because the way combinatorics works, the different schemes don’t give you vastly different numbers—once you start enumerating, the first few you look at are completely, obviously not our universe: no notion of time, different parts of space are disconnected, all kinds of pathologies that are pretty obviously not our universe. The thing that I thought would be the case is that one would have to look through billions of different candidate universes before you find ones that aren’t obviously not our universe. One of the things I discovered a few years ago is that that is not the case. Even within the first thousand conceivable candidate universes, there are already rules, already cases, candidate universes, whose behavior is complicated and you can’t tell that it isn’t our universe. Can’t prove that it is our universe, but you can’t tell that it isn’t our universe So what typically happens is you’ll start one of these things off and it will bubble around, and you’ll follow it until it has—well, when I was last doing it, it was maybe around 10 billion underlying nodes—and then it’s off and running and bubbling around, and you say, Is it our universe or is it not our universe? Well, this is where computational irreducibility bites you, because you ran it up to 10 billion nodes, but that’s still 10-to-the-minus-58th second of the evolution of our universe, and it’s really hard to tell at that point whether this thing that’s bubbling around is going to end up having electrons and protons and god knows what else in it. That’s where there’s a whole depth of technology that effectively has to recapitulate—effectively what one’s doing is some version of natural science, because you’ve got this universe that you’re studying, it’s in your computer, it’s bubbling around, and then you have to kind of deduce what are the natural laws for that. What are the effective natural laws for that universe. You know what the underlying laws are because you put them in, but you have to say, well, what are the effective laws that come out and how do they compare to the effective laws we’ve discovered in physics? And so what I’m saying is that even in the first thousand candidate universes, there are already ones that might be our universe. And in fact, it could be that one of the ones that’s sitting on my computer, that it is our universe, we just don’t know it yet. That’s the difficulty in making that connection between what we know now from physics and what we can see in this candidate universe. It’s not where one’s in a situation and saying, Oh my gosh, there’s no way that rules this simple can produce the kind of richness that we need to be our universe. We’re in a different situation where rules this simple can create incredibly rich and complicated behavior; we just can’t tell exactly what that behavior is.

For me, it’s sort of an interesting thing, because in modern science—post-Copernican science—one’s led to think in this kind of humble Copernican way where there can’t be anything special about us somehow. At some point, we thought we were the center of the universe, and that turned out not true at all. But now when it comes to our whole universe, we can imagine there is an infinite set of candidate universes. So the thing that seems wrong from a Copernican tradition is this: Why should our universe be one of the simple ones? You might say, Why isn’t it just some random universe out there?

Is your theory that if one universe can be generated from simple algorithms, all universes can and have been? Or would be?

The only thing we can meaningfully talk about in science, as far as empirical science is concerned, is our actual universe: There’s only one. Anything else we say about it is a purely theoretical thing. Now, the question would be—we might then say, gosh, what must be happening is that somewhere out there, not in any way that we can ever be aware of, but somewhere out there every possible universe must be going on. That might be a possible theory. It’s not a testable theory. There’s no way we could test that theory because we’re stuck in our universe. And if that theory was correct, then the overwhelming likelihood is that the rules for our universe are very, very complicated, because if there are gazillions of universes out there and we’re just in a random universe, the randomly chosen universe will be one with very complicated rules. It’s like, each universe could be labeled by an integer. Well, there are an infinite number of integers. If we’re a random integer, it’s going to be a big integer. It’s not going to be ‘8’ or something. There’s no reason for it to be 8 if it’s just randomly chosen.

Now, I have a sneaking suspicion that when we really understand what’s going on, that—typically, in the history of science, those kinds of metaphysical questions have crumbled because they weren’t quite the right question, or things worked in a way that sort of worked around the question rather than having to centrally ask the question. My sneaking suspicion is that what we’ll discover is that any one of a large collection of possible rules for the universe is equivalent in generating our universe. I don’t know if I’m correct. That’s just a guess. Something bizarre like that will happen, I suspect, to make that question of “Why this universe and not another?” not really be a meaningful question. It’s a situation that’s like—in a lot of cases in the history of science, people figure out a lot about how stuff works and then they say, “Well, why was it set up this way and not another?” And Newton was famous for saying, “Once the planets are originally set up, then [his] laws of motion can figure out what can happen. But how the planets were originally set up, well, that’s not a question we can answer in science.” Now, 300 years later, we know a lot about how the planets were originally set up. One of the nicer things I always like about philosophers at that time, like Locke and people like that, would say: the fact that the number of planets is nine, eight, whatever it was in their day—that number is not a necessary truth about our universe. That number is somehow an arbitrary number. That was what they thought. They didn’t think that number—just like I’m saying if our universe turns out to be universe number 1,005 or something, that that’s just an arbitrary number. In their day, they couldn’t imagine that you could compute the number would be seven or something. Now, in our day we know that if we have a star about the size of the sun and you have a solar system and it’s about the age of our solar system, we know roughly how many planets there will be typically in such a thing. We can compute it. But in their day, that was an inconceivable idea, to be able to compute that. And I think that—I think we’re at too early a stage. It feels a bit wrong to say, gosh, our universe must be one of the simple ones, so let’s go out and search for it—because it seems like a sort of anti-Copernican kind of claim. It seems like a very arrogant claim about us and our universe. Why should we be one of the simple ones, not one of the ones that’s incredibly complicated? My guess is, that question will resolve itself in some way that isn’t quite centrally that question, but I don’t yet know what that is.

The potential for your enterprise to have success and for you to actually discover, or determine, what the rule of the universe is relies on it being one of the simple ones.

That’s correct. The approach that I know to take is one like algorithm discovery: search a trillion algorithms and see which ones are useful. And those are fairly simple algorithms that you are enumerating. There are fairly simple rules for the algorithms, so similarly—if our universe has rules that were a few lines of code long, then we could enumerate it and find it, but if our universe is the size of the source code of Mathematica or something, we’ll never ever, ever find it by searching. It’s just absurdly combinatorily far away.

I think it’s a very basic fact about science—one piece of empirical hope, so to speak, in this direction is probably the most basic fact about science of all, which is that there is order in the universe. It might not be that way. It might be that every different part of the universe, every particle of the universe behaves in its own special way, but it’s a fact noted by theologians a couple of thousand years ago that the most remarkable fact about the universe is that it has laws. There is order in the universe and it’s describable by laws, and it might not have been that way. It might have been that the universe was full of miracles going on and all sorts of funky things happening that were not governed by laws, but in fact it has laws, and the very fact that it has laws and is orderly at some level shows that it is not as complicated as it might be. And that gives one some hope that it actually could be really simple. And simple enough that we, in the early part of the 21st century, can use our computers and go out there and find it. It’s a little bit like the question: when does Wolfram Alpha become possible? I started thinking about Wolfram Alpha–type things 40 years ago now, and at that time it was probably impossible. If I had started building that when I was 12 years old, I would have finished it about the same time that I actually did finish it.


Maybe later. Some of the necessary intervening steps were done by you, and you wouldn’t have been doing that if you were trying to . . .

I can’t in any way prove that this is the right time in history to go see if there’s a simple rule for the universe. The thing I find really surprising and remarkable and not what I expected at all is that one runs into “obviously not our universe” candidates as quickly as one does. I really thought that one would be searching billions, trillions of candidates before one found one to exclude very easily.

It’s really a funny question: If there is one of these early universes that is a rich and complex universe that happens to not be our universe, that in itself will be a very bizarre discovery, because one could say, Well, there’s a universe and it’s got an eight-and-a-half-dimensional bizarre kind of particle in it, and it’s got this and this. And it will be surprising. So one thing that’s a question in physics now is, Is there a way to assemble the universe so that it’s self-consistent but different from the way that it’s actually assembled now? And we’ll know that. In the worst case, by examining these candidate universes, we’ll start to know answers to questions like that.

The fundamental aim of Wolfram Alpha: to foster and democratize computational knowledge At a purely personal level, the fact that it’s possible to do Wolfram Alpha slows down at least my efforts at finding these things about the universe, because—for me, at least, I tend to be one of these people who works on large projects, and I tend to always have a supply of things that I think will be possible one day, and the question is, What decade can we actually try to do them in? Because if one picks it wrong, one will spend the whole decade trying to build infrastructure that even allows one to get to the starting point. This is the decade when computational knowledge has become possible, and there’s a lot of just really, really interesting things that I think can happen from it. And I think the ways in which—people’s expectation of how they interact with the world is really changing and I think can change as a result of computational knowledge, because people just don’t expect that they can answer questions about the world now. The Web and search engines and such have changed that to some extent. There was a time when basic facts were not things that most people thought you could get easy access to. It took a lot of effort to get to basic facts. Now we can get to basic facts, but can we figure out specific answers to questions? Well, that’s what computational knowledge is trying to do, and it will become routinely the case that people live their lives by being able to answer the questions that come up. Maybe they’ll automatically be answered for them in some sort of preemptive way.

But I think one of the things, again, that has happened today is that there was a time long ago—early on our timeline, so to speak—when facts were only available to a few people in monasteries who had access to the books. And then facts got spread out—there were libraries, books people bought, education—and then along came the Web, and facts were pretty readily available for all. At this point, there’s still expert question-answering but still very concentrated. If you want to know the answer to some question, you have to go find an expert, and that expert is in short supply perhaps, and it’s all a big heavyweight process. I think what computational knowledge is going to do is essentially democratize that process and make it the case that if our civilization can answer the question, then you can answer it in five seconds. In other words, there may be questions that our civilization doesn’t know the answer to; there may be questions where computational irreducibility intervenes and the question isn’t really answerable. But if it’s in principle answerable, then I think we can very much democratize that process and let anybody answer it quickly.

Is it fair to say that that is the fundamental aim of Wolfram Alpha: to foster and democratize computational knowledge?

That’s what we’re trying to do. That’s the big effort. That’s the thing: Absent these various realizations, one might have thought that with computational knowledge, we’ll really not be able to get very far; it’s very specialized and won’t be able to be generally useful. And for me, that’s the big metadiscovery of the past two years: that at this time in history, it’s actually possible to do this. I don’t think it will get progressively easier to do it—there’s not going to be a dramatic moment when it gets much easier—but it sort of came over the horizon, it became possible, and it will gradually get easier. But this is the time.

To what extent does Wolfram Alpha’s ongoing development still require it to be your primary focus?

Me personally? I’m spending all my time on this right now because it’s really interesting and there’s a lot of—the actual process of adding more domains of knowledge, we figured out the framework for how to do that. I get involved because I find it interesting and I think we can do it somewhat faster and better that way. But the main thing—given this idea of computational knowledge, you’ve only seen the beginning with Wolfram Alpha as it exists today. There’s coming real soon, one of the big things is—today you communicate with Wolfram Alpha by feeding it pieces of text. In the near future, you’ll be able to upload images to it. So, instead of giving it a linguistic description, you’re giving it an image.

Will it tell you what the image is? Or what other things can you do with the image?

Not yet. It can’t tell you what the image is yet. The typical thing is that you’ll mix some linguistic thing with, I don’t know, “closest paint color.” You got an image of some thing, and it knows the paints and it can see the image and figure that out. Or you might be able to say—there’s all sorts of tricky things that you can do. Like there’s a shadow in the image and we know what the geotagging of the image is and we know where the sun is and we can figure out based on the length of the shadow, we can figure out how high the thing is and all sorts of fancy things like that. The problem of recognizing images, we’re working on that one, but it’s a knotty problem. The most interesting things will probably be—one of the things that’s always fun with this type of technology is that until you’ve built it to some level and you can really play with it fully, it’s actually quite hard to tell what it will feel like. I know you can do a lot of really good toy things with uploading images and using images as input, but what will be the things that are the really, really useful things you can do? That will become clear once one is routinely doing it. Other things that are coming, like being able to get sensor data flowing into the system and asking questions of the data that’s coming from some sensor that one has: you can say what types of flights are overhead and what speeds. You can get things from seismologists or weather stations. What I mean is your own personal data, like you connect it to your IMAP server and you’ll be able to analyze the sequence of your receipt times of e-mails and things like that and be able to plot that as compared to when the sun rose on a particular day, and so on.

How else can we expect to see Wolfram Alpha develop in the near future?

I mentioned these things about preemptive delivery of information. Being able to provide knowledge based on what it can tell is going on for you, rather than based on what you specifically ask it, that’s one type of thing. Another type of thing is watching what’s happening in the world and being able to automatically figure out what’s interesting. We have an unprecedented collection of feeds of all kinds coming into our servers, and we know all these kinds of things. We can see this peak in this curve: There’s something. What we want it to do is figure out what’s interesting. Of all the stuff that’s coming in, what’s newsworthy. What’s worth telling people about and what’s just the normal course of what happens on a Friday afternoon? That’s another kind of thing: being able to figure out—because we do have the largest collection of data about different things going on around the world that anybody’s every assembled. It comes in in real time, and we should be able to figure out what’s happening, globally, what are the interesting things that are happening.

Another direction is, given that you have a complex task that you want to figure out how to do—that task might involve: you buy this component from this company, you connect it in this way, you do this, you do this. The question is, Can we figure out in some almost creative way, given that you describe your task in such-and-such a fashion, can we figure out how to achieve that task? How to perform that task? That’s a thing we’re trying to work towards. There’s small cases. Like right now, if you type in some funny resistance, we’ll be able to compute what pair of serial and parallel resistors will make that particular resistance. That’s a very trivial case. But a much more complicated case is: One says, “This is what I want to make, and I’m going to describe it,” and I would go to a design engineer and say, “I want a thing that has this property and this property and this property.” They’ll make it for me, figure out how to do that. That’s a challenge that in a sense mixes together several different kinds of things. It mixes together—perhaps there may be a spark of creativity needed on the part of that design engineer. Maybe we can find that by searching some chunk of the computational universe. It mixes, but we have to know what actual resistors exist. How strong is this material? How far is it from this place to that place? So we have to have knowledge about the world. So the concept there is to what extent can we actually go from human description of the human’s goals to how do you achieve those goals with the stuff that exists in the world and the stuff we can compute? And maybe a spark of inspiration that perhaps we can also get automatically. One of the most striking things to me is—in terms of this human inspiration thing—a few years ago we put up this site called WolframTones, which is a website that generates music based on a search of the computational universe. The thing that has been most bizarre to me about that is I keep on running into people saying, “I’m a composer and I use this site as inspiration.” That’s sort of the exact opposite of what I would have expected. The role of the computer versus the role of the human. I would have expected the human as the one going, “I have an inspiration. I’ve got a human-created spark, and now I’m going to use the computer to work out that human-created spark and render it in the right way.” But instead what’s happening is that these things that one is plucking from the computational universe, those are the sparks that the humans are then working through to develop into something that they find interesting. It’s a simple case, but it’s an encouraging sign. One of the big things that comes out of that is this mass-customization idea of, How expensive is creativity? What’s the economics of creativity? Can you automate some of creativity? This idea of going from a description of a complex objective to how is that achieved—can you get the spark of inspiration automatically?