Researchers at IBM’s Almaden, California research lab are building what will be the world’s largest data array–a monstrous repository of 200,000 individual hard drives all interlaced. All together, it has a storage capacity of 120 petabytes, or 120 million gigabytes.
There are plenty of challenges inherent in building this kind of groundbreaking array, which, says, IBM, is destined to be used for, as Technology Review writes, “an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena.” For one thing, IBM had to rely on water-cooling units rather than traditional fans, as this many hard drives creates heat that can’t be subdued in the normal manner. There’s also a sophisticated backup system that senses the number of hard disk failures and adjusts the speed of rebuilding data accordingly–the more failures, the faster it rebuilds. According to IBM, that should allow it to operate with the absolute minimum of data loss, even none.
IBM’s also using a new filesystem, designed in-house, that writes individual files to multiple disks so different parts of the file can be read and written to at the same time.
This kind of array is bottlenecked pretty severely by the speed of the drives themselves, so IBM has to rely on software improvements like that new recovery and filesystem to up the speed and enable the use of so many different drives at once.
Arrays like this could be used for all kinds of high-intensity work, especially data-heavy duties like weather and seismic monitoring (or people monitoring)–though of course we’re curious as to what this particular array will be used for.