With the release of the next TOP500 ranking of the world’s fastest supercomputers just weeks away, Oak Ridge National Laboratory (ORNL) has this week officially deployed Titan, a 20-petaflop machine. Titan is expected to edge out Sequoia, another Department of Energy machine housed at Lawrence Livermore National Labs, putting the U.S. confidently back atop the supercomputing pyramid (Sequoia is expected to hold the number-two spot) after spending the last few years often chasing China and Japan.
But beyond bragging rights, Titan is something more. It will hands-down be the fastest open science machine in the world, granting time to scientists in industry, academia, and government labs around the country who need huge computing capabilities to make sense of complex data sets in six core areas: climate change, astrophysics, materials science, biofuels, combustion, and nuclear energy systems. And critically, it incorporates graphics processing units (GPUs) alongside the conventional central processing unit (CPU) cores normally deployed in supercomputers of this kind. This successful marriage of CPUs and GPUs could have far-reaching implications for the future of supercomputing as scientists strive to develop a next-generation exascale science machine.
“We bet the farm on this hybrid computing environment and we succeeded.””Titan will be the biggest and fastest open science computer today,” says Steve Scott, chief technology officer for Tesla, the business unit of NVIDIA responsible for supplying Titan’s GPUs. “It may or may not outpace Sequoia. It’s nice to have those titles, but it’s not as important as the science that’s being done on the machine.”
To the collaboration that developed Titan, whether or not its computer clocks in faster than the reigning champ over at Lawrence Livermore is an afterthought. Sequoia, an IBM BlueGene/Q system, is designed to run classified research for the DOE and thus will soon go off the radar, back behind the curtain of state secrecy where the average researcher will be hard pressed to gain access to it. Titan, on the other hand, is designed with open research in mind. And it’s already ready to compute at a level the research science community has never before seen.
Titan is capable of producing 20,000 trillion calculations per second. To give you an idea of how far and how fast this computational capability has traveled, consider that back in 2009 ORNL was also home to the world’s fastest supercomputer, named Jaguar (Titan is actually an upgrade of Jaguar rather than a from-scratch system, though Titan’s architecture is very different). Jaguar was a 2.3-petaflops system (“flops” stands for floating-point operations per second and is the measurement of supercomputing performance) when it topped the world’s list of fastest computers. In just three years, Titan has eclipsed Jaguar by ten times.
That leap forward was enabled largely by rethinking the way ORNL builds supercomputers. One could feasibly enhance computing capability by ten times by building a computer ten times larger with ten times more CPUs, but doing so would be impractical on many levels. Aside from the hardware challenges inherent in such a large machine, the energy needs of the 2.3-petaflop Jaguar were equivalent to that of 7,000 American homes. A 20-petaflop Jaguar would require something like 60 megawatts, or 60,000 homes’ worth of energy to function. To get Titan to where it is now without building a massive energy suck took lots of collaboration, an increased reliance on a new kind of hardware regime, and a pretty serious dose of moxie.
“In 2009, we invented hybrid multi-core before we even had a word for it,” says Jeffrey Nichols. “From there we made a three-year leap of faith that has paid off tremendously in a 10-times leap in performance, a five-times leap in efficiency.”
Nichols is referring to the integration of graphics chips, GPUs, into the conventional CPU architecture. GPUs are uniquely suited to certain tasks, and they are particularly good at handling multiple–dozens or even hundreds–of calculations per second. CPUs aren’t particularly good at this kind of computing, though they are still very well-suited to conventional computing tasks, things like the fundamental running of lines of code. To build Titan, ORNL brought together supercomputer-maker Cray and GPU manufacturer NVIDIA to create a hybrid system containing 18,688 Advanced Micro Devices 16-core CPUs and 18,688 NVIDIA Tesla GPUs that would work together to complete tasks faster and with far greater efficiency. The core research was there, but the challenge lay in lining up all the pieces–all 40,000 of them–and making them work.
Everyone involved in Titan’s development was working on some degree of faith here, Nichols explains, and everyone was facing the prospect of failure. Cray had to expose itself to a new kind of hardware and interface that would be able to speak between CPUs and GPUs, something that it had never done before. NVIDIA, who has been claiming for some time that its GPUs possess important capabilities that apply far beyond the gaming console or PC, it was time to prove that this kind of hybrid computing could really take place at the supercomputing scale. And ORNL was perhaps in the most precarious position of all, with its leadership role in global supercomputing on the line. Had Titan failed to come online on time (or at all), it would’ve been a major setback, perhaps a multi-year setback–years that would be very hard to make up in the fast-moving supercomputing field. “For an organization with a mission that has to be met, that cannot afford a stunt, we bet the farm on this hybrid computing environment and we succeeded,” Scott says.
This roll of the dice is now paying off handsomely. Rather than creating a computer ten times the size of Jaguar, the upgrade to 16-core CPUs and performance-accelerating GPUs allows Titan to fit in the same 200 server cabinets that Jaguar fit into. And while it does suck up more power than its predecessor, Titan only requires about 9 megawatts–a fraction of what it would need if it were an all-CPU architecture running at the same speed.
That’s still a $10-million-per-year energy bill, but when you compare it to the current field of machines around the world and the imperative that we stay up to speed with the competition (particularly with a certain competitor across the Pacific), Titan is a major step forward for American supercomputing. The DOE is trying to create an exascale supercomputing capacity (hopefully by 2020) just as China, Japan, India, and various other countries in Europe and around the world are trying to beat the U.S. to it (exaflop performance is the next major milestone in performance, equal to 1,000 petaflops). Unlike some of those competitors, the DOE is trying to do it on both a tight financial budget and a tight energy budget.
“The difference between what we see in the U.S. and elsewhere is that we’re trying to get to exascale within 20 megawatts of power,” Nichols says. That’s roughly $20 million worth of power per year at today’s prices. China doesn’t have those kinds of fiscal or energy constraints right now, making Titan’s leap forward in performance and efficiency all the more significant from both research and development and national security perspectives. Still, to reach exaflop performance on the aforementioned budget will require something like a 50-times improvement in capability on the same amount of energy. Daunting, sure, but Nichols and his colleagues at ORNL, Cray, nVidia, and elsewhere are already hard at work on solutions.
“We had the biggest machine in 2009 and we were already thinking about the 2012 machine,” Nichols says. “And we’re already thinking about the 2016 machine.”