Mira the supercomputer
Even if scientists don't study computer science, of course computer modelling is becoming more and more involved in solving scientific problems from physics to biology.
To find out how it works, Chris Smith took the opportunity to meet up with Pete Beckman from Argonne National Laboratory in Chicago and visit Mira, the fifth-fastest supercomputer in the world.
Pete - So right now, we're in the building which houses our supercomputer Mira. That's its nickname and this data centre here, we refer to as The Core.
Chris - When we say supercomputer, what does that mean?
Pete - Here at Argonne, we try and solve science problems that are so massive, so large that there's no way you can fit this on a couple of machines or laptops or even at Amazon web services. This is a machine which is purpose built to run the absolute, most massive computations you can imagine. So, if I were to try and solve a problem on my laptop that might take a couple of hours, I could do that in a split second on Mira. But the real question is, what about the opposite? If I spent a day of time on Mira, what kind of problem could I solve? If that problem were even to fit in my laptop, it wouldn't fit, but if I just churned it out on my laptop, it would take more than 300 years to compute. So no scientist is going to wait around for more than a couple of days to get their answer, certainly not hundreds of years.
Chris - What sort of problems need that sort of power?
Pete - So, a good example is climate modelling. Another good example is understanding the nature of the universe, so one of the world's largest computations for Big Bang and dark energy and dark matter. We can't wind the clock back and do an experiment in a lab to understand the Big Bang, so we do that experiment with a simulation inside our computer. There are also applications in designing jet engines and designing landing gear, and more commercial applications that companies use because again, it's difficult to build a jet engine that you would like to design and then test it. It's much easier to build that in the computer.
Chris - What actually is the architecture of a supercomputer? How is it constructed?
Pete - So, supercomputers are different than regular machines in that they put a lot of emphasis on floating point operations and moving data very quickly.
Chris - What's a floating point operation?
Pete - Good question. So, if you were to do a simple math problem, A times B plus C, that's called a FLOP. And a FLOP is a floating point operation. And so, one multiply and one add. So, our machine is 10 Peta-FLOPs.
Chris - Does it consist of effectively, one computer connected to another computer, connected to another computer, connected to another computer? So, you've just got hundreds of hundreds of machines all working in parallel or is it more complicated than that?
Pete - Nowadays, we build supercomputers by making them parallel. It isn't the way supercomputers were always built. In the first supercomputers, it was a single computer that was just built to be very, very fast. So, this is the model of the Formula 1 racing car. But now, with the problem so big, there's no way you can make one car drive as fast as you want so you have a fleet. And so, what we have in this room is really a purpose built fleet of CPUs and memory and connection.
Chris - How do you set tasks for it to do?
Pete - So, we have a programming language. The language we use is C and C++ and Fortran. But then we have a series of mechanisms by which we push data and share data back and forth between all of these hundreds of thousands of processors. We literally have hundreds of thousands of CPUs in the machine and that layer is called MPI for Message Passing Interface. And that's what computer scientists learn. They learn how to design algorithms and breakup a scientific problem into thousands of little pieces that can be solved independently.
Chris - And if one of those hundreds of thousands of CPUs dies, do you know and how does that affect the operation of the supercomputer?
Pete - We hear the tree falling, yes and scientists are not happy at all when that happens. Most applications are using a technique that we called checkpoint restart. So, they periodically save their state. It's kind of like with your home computer, you hit Save. The problem is that with so much data in our machine, even just saving that out takes 30 minutes to an hour. So, you can't afford to hit Control S and Save every few minutes. It's something that happens every 6 hours or 8 hours that you would save your checkpoint.
Chris - Is this resource available to scientists at your institution and beyond, and do they just book time on it if they want a problem to be solved?
Pete - Actually, because this is considered a national resource, that means that if you have a good science problem and you write a proposal and convince people that you can solve your science problem with our supercomputer then you get time on it for free. And so, we have people from all over the world who have applied to get time on our supercomputer.
Chris - Can we go in to The Core.
Pete - Yeah, let's go in. It's a little bit noisy in there, so you might not be able to hear me so well. But we're going to open the door, head into The Core and take a look at Mira.
Chris - You're right. It is pretty noisy in here. I'm basically seeing rows and rows of massive black racks.
Pete - Yes. The reason it's so noisy is that a supercomputer runs on several megawatts of power and you have to cool that. And so, right now, what you're hearing are all of the fans blowing air in order to provide cold air. Now Mira is actually unique in this case though because a part of Mira is water-cooled. So, if we didn't have the other supercomputers in here, if we only had Mira, it would actually be pretty quiet. If you were to leave this building and look outside, you'd see the chillers which are providing the cold water.
Chris - What is your electricity bill?
Pete - So, a megawatt of power is about a million dollars a year and Mira needs about 6 megawatts. So, that's about a $6 million a year electric bill which is a lot of money.
Chris - Is that just the computers without the aircon taken into account or does that include your aircon?
Pete - That's just the computer and then the facility is a bit more. But this machine is actually extremely power efficient compared to other computers. So, there's a computer in China and at Oakridge that take 20 megawatts to run and cool. They're much, much less efficient than our machine. So, that is a $20 million electric bill which is kind of hard to fathom.