How do Supercomputers Work?

Supercomputers are, as the name suggests, extremely powerful computing devices. They are used to model extremely complicated systems such as the weather as well as for high...

11 September 2011

Interview with

Chris Maynard, Edinburgh Parallel Computing Centre

Part of the show Supercomputers & Super Computing

IBM_Blue_Gene_P_supercomputer.jpg

The Blue Gene/P supercomputer at Argonne National Lab runs over 250,000 processors at room temperature, grouped in 72 racks/cabinets connected by a high-speed, optical network

Credit:

CC-BY-SA -...

Play Download

Diana - Supercomputers are, as the name suggests, extremely powerful computing devices. They are used to model extremely complicated systems such as the weather as well as for high precision simulations and complex calculations required in quantum physics. So that might be molecular modelling, it might even be predicting revolutions and many, many more things. The Edinburgh Parallel Computing Centre or EPCC provides computing resources to Edinburgh University and to industry. And we're joined by Dr. Chris Maynard. Hello.

Chris M. - Hello.

Diana - So can you tell us, what is a supercomputer?

Chris M. - A supercomputer is a computer that can perform calculations very quickly, much faster than a normal computer would. Typically, this is done by computing in parallel, having lots of processing units or nodes working together. Each of these nodes could be a computer one finds in a desktop PC or a laptop, but how they're connected together in hardware and how they cooperate to solve the problem is very important to the speed of the calculation.

Diana - What is this structure then? How big are these things going to be and what is their power consumption and the speed of the individual parts that go into it?

The Blue Gene/P supercomputer at Argonne National Lab Chris M. - So, for instance we run HECToR which is the UK's National High Performance Computing Service along with other partners and this is funded by the Office of Science and Technology through the EPSRC's high end computing program. This particular machine is called HECToR and it's a Cray XE6. It has 1,856 nodes in it and each of these nodes is a 12-core Opteron processor so that gives you a total of 44,544 cores. Each of these Opterons is coupled with a Cray Gemini routing and communication chip. So in total, it has a very large memory, about 60 terabytes and its theoretical peak performance is 360 teraflops which is 360 trillion computations per second, and is the fastest computer in the UK, and the 24th fastest in the world.

Diana - Okay, so those are really sky high numbers then. I mean, that's going to throw my poor little ancient laptop into the shade, but once you've got this hardware, what is the software that goes on top of that? How does it work compared to a normal program that you'd have on an everyday computer?

Chris M. - The bit that makes a supercomputer super is how these computing nodes are coupled together. So you have some hardware that does it but you have to then write some software that will enable them to talk to each other. Typically, for science programming, what you might have is a model of message passing. So we would split the problem up across many processors so that each processor has a small amount of data that's local to it and then each processor runs the same calculation on this local data, but at some point during the calculation, and quite regularly during the calculation, the processors have to talk to each other and they have to communicate with each other. So, you would send a message to your neighbour and there's a communication grid and they would speak to each other by - you would send some data to your neighbour on the right and you'll receive it from your neighbour on the left, and then you might do that in many dimensions; up, down, left, right, back and forth, depending on the problem.

Diana - Okay, so given you've got this communication type software, does that mean then that you develop that rather than rebuilding a new supercomputer each time because obviously, the expense of that must be huge?

Chris M. - Yes, absolutely. So you're trying to have this programming model that you would use so that all these processors can talk to each other and you'd write a scientific program in a language you would normally use like C or Fortran and then if you're going to use this message passing model, there's a communications library called MPI which you can then use on more than one machine.

Diana - And supercomputers, are they advancing at the same rate that normal computers are?

Chris M. - Well that's a very interesting question because they are and there's something called Moores Law which is an empirical observation that the number of transistors in an area of silicon has doubled every 18 months or so. And this has been true since the '60s and if you have a smaller circuit, your computer can run faster, but it also runs hotter. And since perhaps the middle of this decade, 2005 or so, it's been no longer possible to get sufficient power into and out of a chip fast enough and so it gets too hot. So you can't run it as one computer chip anymore, but what you can do is run them as two chips or four chips, or many more chips so you're getting more cores on one piece of silicon on one chip. But now, if you're having many of these nodes together that are then multi-core, you've now got even more complicated communication patterns between things that can see the same piece of memory on one chip and things that can't that then have to communicate to send a message between the chips.

Diana - So really, it's like building a brain essentially with lots of neurons. But what sort of science are you doing in Edinburgh? What are the models that you're building with your supercomputer?

Chris M. - Well let me give you an example - so last week, you talked about scientists modelling their response of the heart to drugs. At EPCC, we've done some work with some scientists from Kings College in Oxford to improve the performance of their computational model of the electrophysiology of the heart so it can now run on many more processors or cores than it could previously up to now say, 16,000 and this means they can do 1-second simulations of the heart which would take them under a minute rather than say, more than one hour previously. And this is a great improvement and it goes much faster, but if you want to use this model to say, guide surgery then you need to be able to do it in real time.

Diana - I see. So this could really help medicine as well and help surgeons while they're working.

Chris M. - Absolutely. Another example might be if you look into the night sky tonight, just beyond the plough, you can see a supernova that's exploded and if you want to try and understand how a supernova works, you might try and run a calculation on a supercomputer.