Processing: What to record?

Collisions in the LHC produce too much data to record. To tackle the torrent, the Grid focuses in on interesting physics in a 2-step selection process

The volume of data produced at the Large Hadron Collider (LHC) presents a considerable processing challenge.

Particles collide at high energies inside CERN's detectors, creating new particles that decay in complex ways as they move through layers of subdetectors. The subdetectors register each particle's passage and microprocessors convert the particles' paths and energies into electrical signals, combining the information to create a digital summary of the "collision event". The raw data per event is around one million bytes (1 Mb), produced at a rate of about 600 million events per second.

The data flow from all four experiments for Run 2 is anticipated to be about 25 GB/s (gigabyte per second)

  • ALICE: 4 GB/s (Pb-Pb running)
  • ATLAS: 800 MB/s – 1 GB/s
  • CMS: 600 MB/s
  • LHCb: 750 MB/s

The Worldwide LHC Computing Grid tackles this mountain of data in a two-stage process. First, it runs dedicated algorithms written by physicists to reduce the number of events and select those considered interesting. Analysis can focus on the most important data - that which could bring new physics measurements or discoveries.

(Video: CERN IT department)

In the first stage of the selection, the number of events is filtered from the 600 million or so per second picked up by detectors to 100,000 per second sent for digital reconstruction. In a second stage, more specialized algorithms further process the data, leaving only 100 or 200 events of interest per second. This raw data is recorded onto servers at the CERN Data Centre at a rate around 1.5 CDs per second (approximately 1050 megabytes per second). Physicists belonging to worldwide collaborations work continuously to improve detector-calibration methods, and to refine processing algorithms to detect ever more interesting events.