Voir en

Too much data – a good problem to have

Eckhard Elsen is Director for Research and Computing

Large volumes of high-quality data may be a challenge, but addressing it brings innovation

17 October, 2016

Last week, the 22nd International Conference on Computing in High-Energy and Nuclear Physics, CHEP 2016, took place in San Francisco, attracting some 500 experts from all over the world. This gave the LHC experiments a great opportunity to showcase the impressive progress they have made in mastering the ever-increasing data volumes and to highlight their plans for the High-Luminosity period of the LHC.

The experiments have made a fantastic effort in optimising their code and minimising unnecessary copying of data. Triggering is becoming more sophisticated with the inclusion of track and vertex information allowing ATLAS and CMS to be more selective in what they record. Meanwhile, LHCb has introduced its turbo stream, which serves some 80% of LHCb analyses. It is based on a compact record containing all the information necessary for analyses. ALICE is adopting a similar approach, blurring the divisions between online and offline, recording data from all events without a trigger decision, while reducing the amount of data to be stored per event.

With the LHC performing as well as it is, this is welcome news; the availability has almost been doubled. As a consequence, the experiments are recording more events than anticipated so far in Run 2, so they still exceed the allocated resources. Too much high-quality data may be a challenge, but it is a good problem to have.

Progress like this keeps CERN in the vanguard of high-throughput computing (HTC). This is important, not only for us, but also because it enables us to share experience with other fields of science for which HTC is becoming increasingly important. The conference programme at CHEP was bustling with presentations of new software tools, machine learning and progress in effectively using multi-cores on modern computing platforms. Experiments are joining forces via the HEP Software Foundation. Key to LHC computing is, however, the development of the network itself, where the rate of progress has not slowed down. The issue of national and transcontinental networks thus figured highly at the conference. With sufficient bandwidth installed, the location of the computing resource becomes arbitrary.

And that brings me to another recent conference, the International Conference on Research Infrastructures, ICRI, held in Cape Town from 3-5 October. There’s a good reason why ICRI was in South Africa this year. The country co-hosts an exciting new research infrastructure: the Square Kilometre Array, SKA, the world’s largest radio telescope. A precursor to the SKA, MeerKAT, is up and running, but MeerKAT is only a small fraction of the final SKA configuration. Once complete in 2025, it will bring together dishes in South Africa and Australia with a surface area of one square kilometre. They will all be on stream all the time, producing data volumes that dwarf even those of the LHC.

South Africa already hosts a WLCG Tier 2 computing centre, and there was some discussion at ICRI on how to build on this to bring in other areas of science, such as the SKA. One way forward is for South Africa to build a Science Cloud – a public sector facility for scientific computing. Science Clouds are, I believe, the way forward for public sector science and an evolution of the WLCG. Such a facility would be a wonderful showcase for scientific cloud computing, and an asset for South African science.

It’s been an interesting few weeks for scientific computing, leading me to conclude that CERN remains in the vanguard not simply because of our high data volumes, but because we're developing new tools to deal with them. The bottom line for me is that we have much to give, and we have much to learn from others. In scientific computing, interdisciplinary collaboration is the future.

DG Word

Reflecting on all we have achieved together

At CERN

Opinion

Fabiola Gianotti

19 December, 2025

One year on

At CERN

Opinion

Fabiola Gianotti

13 April, 2021

Introducing the new members of CERN’s managem...

At CERN

Opinion

Chile becomes an Associate Member State of CE...

CERN to host Europe’s flagship open access pu...

BASE experiment at CERN succeeds in transport...

CERN Courier Sep/Oct 2025

High-Luminosity LHC images

LHC Facts and Figures

Too much data – a good problem to have

Related Articles

Reflecting on all we have achieved together

One year on

Introducing the new members of CERN’s managem...

Also On At CERN

Muon g-2 Experiment Pioneers Win Breakthrough...

Inclusion Matters: CERN’s new participatory i...

Chile becomes an Associate Member State of CE...

CERN to host Europe’s flagship open access pu...

CERN community: celebrate spring with us and ...

Who’s who in the CERN senior leadership team

Monika Emmanuelle Kazi selected for the secon...

Heating homes with the world’s largest partic...

Presidential visits to CERN

CERN

Science

Featured resources

Too much data – a good problem to have

Related Articles

Also On At CERN