It's nice to have the latest kit, but a supercomputer upgrade is about to bring the German Climate Computing Center, DKRZ, a big problem: a shortage of space.
Not space for the computer itself, but for the data it generates.
DKRZ runs climate models on its supercomputer, projecting how our planet's weather will evolve over decades or even, in some cases, hundreds of millennia, from the last ice age and into the future.
All those models generate huge volumes of data — 40 petabytes of so far — that DKRZ archives for future reference, allowing researchers to analyze the models' output in different ways. The center also offers to store the output from climate models run by other supercomputing centers, forming a world climate studies archive drawn on by researchers around the world.
The center's supercomputer upgrade, switching from one using IBM's Power chips to an x86-based machine made by Bull, means that it will accumulate as much data every six months as it has over the whole of the last five years.
That's because it will be able to simulate atmospheric changes on a more detailed grid. "Every successful model produces four to 10 times more data," said DKRZ's Ulf Garternicht.
Garternicht leads the center's IT systems team, which is now commissioning a storage system capable of holding 500 PB or more.
It's using a new version of High Performance Storage System (HPSS), initially developed by IBM and the U.S. Department of Energy back in 1992. The military applications directorate of France's Atomic Energy Commission was also an early development partner, and the first applications of HPSS were for storing data from atomic weapons simulations, although now it is used for storing data for academic research and weather forecasting.
With HPSS, "The most important thing is the scalability," said Garternicht. "We could double it in size without having to change the architecture." That means DKRZ could ultimately use it to archive an exabyte of data.
Reliability is key too. The existing data archive is also built on HPSS and, said Garternicht, "We haven't lost any data over the last five years."
With such large volumes of data, being able to shift it quickly into and out of the archive is important. As in the existing system, most of the data will be stored on tape, but to speed things up there is also a disk cache.
As with any caching system, the goal is to keep the "hot" data in cache for as long as it's needed — no mean task with the size of the datasets used in climate modelling.
The current system's cache holds 5 PB and can shift data in and out at a sustained rate of 3 Gbps, or 5 Gbps peak: The new one will hold 50 PB, and will run even faster.
Sign up for CIO Asia eNewsletters.