Do we really gain anything from the ceaseless profusion of data?
By Lawrence WeschlerPosted 11.04.2011 at 4:57 pm 11 Comments
I should perhaps begin by saying that I am as big a fan of the Net and the Web and the whole expanding “information universe” as anyone you are likely to meet. I find myself online all the time, mining for data, merrily skipping from one site to the next, passing the time of day after day (and night after night) in scattershot dalliances (sampling this and sampling that in a virtual delirium of free association), deploying my trove of finds in ever more elaborate collages of discovery (or is it recovery?) of my own. And yet... and yet...
This week, PopSci is peeking under the hood of some of the nation's biggest and baddest supercomputers--the machines that turn big data into big discoveries, big technologies, and big leaps forward. Over the last week, we managed to get each of the busy machines in this series on the phone to see what they were up to on during a particular day. They were happy to share.
Stealing information can be just as lucrative--and destructive--as stealing anything else. Our look at the history of data theft touches on some of the major (or just really interesting) crimes in history. The father of the American Industrial Revolution? A glorified data thief. That tea you're drinking (let's say just for the duration of this sentence, you are drinking tea)? That's a stolen secret recipe, the theft of which involved a Scotsman dressed up in "traditional mandarin garb." And if you're a PlayStation Network user or a Gawker commenter, you'll be familiar with some of the latter items on our list. And don't forget to check out the rest of Data Week, our exploration of all things data.
When Roy Buol stepped into the mayor's office of the city of Dubuque in 2005, he did so with a handful of imperatives. There were the needs of his citizens, who throughout his campaign had voiced concerns on issues like public transit, green space, water quality, and recycling. There was the need to live up to his campaign message, which centered on engaging citizens as partners in the administration of the city. And then there were his private concerns about the larger world.
Scientists from every discipline have more data than ever, but it's only as useful as the meaning behind it. Every bit of information is only explained by the context in which it was gathered, and often in the context in which it is used. "There is no such thing as raw data," says Bill Anderson of the School of Information at the University of Texas at Austin and associate editor of the CODATA Data Science Journal.
Take the number 37, Anderson says. Other than stating a numerical order, it means little on its own. But with some more information — 37 degrees Celsius, for instance — it can take on more meaning. Now give it some context: 37 degrees C is normal body temperature. Now 37 represents something useful, something a doctor or researcher could use, and it becomes a piece of knowledge that could comfort a patient or answer a question.
In 2006, Netflix made its vast database of user-generated movie ratings available to the public, offering $1 million to the first team that could improve the accuracy of the company’s recommendations by 10 percent. That’s a lot of money—but Netflix could have spent much more on in-house development, with no guarantees. By 2009, the top team had its prize, and Netflix had its algorithm. Other groups took notice and are now holding their own contests, asking statisticians, computer scientists and basement hobbyists alike to mine complex data sets for solutions to some difficult problems.
At Paddy Power--Ireland's largest bookmaker--teams of quants and risk analysts set the odds on 12,000 to 15,000 events a week--everything from horse races and other sporting events to speculation on the name of Beyoncé's unborn child. Within these events, there are 60,000-70,000 individual bets, or "markets," to be made. And every market needs a set of odds--some kind of calculation of the probability that a specific outcome might occur, based on available data. But how does a bookmaker know what data is good and what data is bad? How can it build safeguards into predictive systems so it doesn't get burned?
The most controversial scientific topic of the past few decades--predicting the fate of the planet--gets a huge dose of data
By Rena Marie PacellaPosted 11.04.2011 at 9:49 am 1 Comment
Before the International Panel on Climate Change launched its Data Distribution Centre (DDC) in 1998, researchers who needed climate-change projections had to get them from the handful of scientists who specialized in computing-intensive statistical climate modeling. Modelers became backlogged with requests; studies languished.
TOP500 Rank: 6
Vital Stats: System: Cray XE6 8-core 2.4 GHz running Linux. Processors: AMD x86_64 Opteron 8 Core 2400 MHz (9.6 gigaflops). Sustained performance: 1.11 petaflops. The hardware takes up about 1,500 square feet but uses less than 4 megawatts of energy. The system consists of 96 cabinets with nearly 9,000 compute nodes and approximately 300 terabytes of memory.