CMS releases new batch of LHC open data

A CMS collision event as seen in the built-in event display on the CERN Open Data Portal (Image: CERN)

The CMS collaboration has made 300 TB of high-quality data from the LHC available to the public through the CERN Open Data Portal.

The collision data come in two types: The so-called “primary datasets” are in the same format used by the CMS Collaboration to perform research. The “derived datasets” on the other hand require a lot less computing power and can be readily analysed by university or even high-school students.

Notably, CMS is also providing the simulated data generated with the same software version that should be used to analyse the primary datasets. Simulations play a crucial role in particle-physics research and CMS is also making available the protocols for generating the simulations that are provided. The data release is accompanied by analysis tools and code examples tailored to the datasets.

These data are being made public in accordance with CMS’s commitment to long-term data preservation and as part of the collaboration’s open-data policy. 

“Members of the CMS Collaboration put in lots of effort and thousands of person-hours each of service work in order to operate the CMS detector and collect these research data for our analysis,” explains Kati Lassila-Perini, a CMS physicist who leads these data-preservation efforts. “However, once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly. The benefits are numerous, from inspiring high-school students to the training of the particle physicists of tomorrow. And personally, as CMS’s data-preservation co-ordinator, this is a crucial part of ensuring the long-term availability of our research data.”

The scope of open LHC data has already been demonstrated with the previous release of research data. A group of theorists at MIT wanted to study the substructure of jets — showers of hadron clusters recorded in the CMS detector. Since CMS had not performed this particular research, the theorists got in touch with the CMS scientists for advice on how to proceed. This blossomed into a fruitful collaboration between the theorists and CMS revolving around CMS open data.

Read more about CMS Open Data on the CERN Open Data Portal.

A longer version of this article was originally published on the CMS website