Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

CERN's data stores soar to 530M gigabytes

Lucas Mearian | Aug. 17, 2015
Since restarting in June after a two-year upgrade, CERN's Large Hadron Collider (LHC) has been recording about 3GB of data per second, or about 25 petabytes -- that's 25 million gigabytes -- of data per year.

Its invention of grid computing technology, known as the Worldwide LHC Computing Grid, has allowed it to distribute data to 170 data centers in 42 countries in order to serve more than 10,000 researchers connected to CERN.

Storing data, sharing data

During the LHC's development phase 15 years ago, CERN knew that the storage technology required to handle the petabytes of data it would create didn't exist. And researchers couldn't keep storing data within the walls of their Geneva laboratories, which already house an impressive 160PB of data.

CERN also needed to share its massive data in a distributed fashion, both for speed of access as well as the lack of onsite storage.

As it has the past, CERN developed the storage and networking technology itself, launching the OpenLab in 2001 to do just that. OpenLab is an open source, public-private partnership between CERN and leading educational institutions and information and communication technology companies, such as Hewlett-Packard and Nexenta, a maker of software-defined storage.

OpenLab itself is a software-defined data center that started phase five of its development cycle this year. That phase will continue through 2017 and tackle the most critical needs of IT infrastructures, including data acquisition, computing platforms, data storage architectures, compute provisioning and management, networks and communication, and data analytics.

A growing grid

In all, the LHC Computing Grid has 132,992 physical CPUs, 553,611 logical CPUs, 300PB of online disk storage and 230PB of nearline (magnetic tape) storage. It's a staggering amount of processing capacity and data storage that relies on having no single point of failure.

In the next 10 to 20 years, data will grow immensely because the intensity of accelerator will be ramped up, according to Dissertori.

"The electronics will be improved so we can write out more data packages per second than we do now," Dissertori said.

Every LHC experiment at the moment writes data on a magnetic tape at the order of 500 data packets per second; each packet is a few megabytes in size. But CERN is striving to keep as much data as possible on disc, or online storage, so that researchers have instant access to it for their own experiments.

"One interesting development is to see how can we implement it with data analysis within our cloud computing paradigm. For now, tests are ongoing on our cloud," Dissertori said. "I could very well imagine in near term future more things done in that direction."


Previous Page  1  2 

Sign up for CIO Asia eNewsletters.