On the road to Open Science

The release of the Open Data Portal is a significant milestone on the road to Open Science, but there is work ahead


This piece was written by Tim Smith and Sünje Dallmeier-Tiessen

The release of the Open Data Portal is a significant milestone, but the end of the road remains to be reached. Open Science represents much more than just the sum of “open” actions; it is an ideal, and for us here at CERN, a return to our roots.

CERN is the epitome of openness, which goes hand in hand with the collaborative nature of our frontier research. The fact that openness is enshrined in our convention is not taken as an obligation, instead it is used as an expression of the strength of our convictions. We helped build the Open Internet, were early adopters of Open Source, helped usher in the preprint culture, and we pioneer initiatives in the Open Access to publications.

Science is predicated on the concept that the hypotheses that we propose to explain the phenomena that we observe can be tested through repeatable experiment. We should share sufficient details of our observations and conclusions for independent scrutiny, reproduction and verification. In this data-intensive age we have somewhat fallen short of this ideal since we have continued to “share” through publication processes which had no place for data, certainly not large volumes of it, nor the code that was needed to interpret it. Hence Open Science is striving to rebalance the processes and reintroduce data and code as first-class research objects to be shared, scrutinized and reused.

As we release the Open Data Portal today, we take a new step on our steady evolutionary path towards Open Science that we have been undertaking these past years. This particular step, however, is new and evokes in many a feeling akin to that of a first parachute jump; thrill, fear or a mixture of both!

Recently in our field, a wide spectrum of initiatives has been opening up data and analysis code to a variety of audiences. Examples include HEPDATA, Rivet, Recast, the Master Classes, not to mention the recent Higgs Kaggle challenge, as well as numerous others. In launching the CERN Open Data Portal we are reinforcing these initiatives by providing a platform to expose, publish and archive data that come out of the CERN experimental programme, and to open them to ALL. To achieve this, the Open Data Portal assigns digital object identifiers (DOIs) to the data sets and code, making them citable objects in the normal scientific communications, and offers the data openly for anyone to download since they are published under a Creative Commons CC0 waiver. Thus the portal provides us with a building block for data management plans and a focal point for preservation actions.

Building the Open Data Portal has also been a prime example of the collaborative spirit that powers our discipline. The Open Data Portal is the culmination of a very close collaboration of digital library experts, data curators and meta data experts from IT and GS, together with data experts, researchers and outreach teams from the four LHC experiments. It also represents the bringing together of two distinct threads we have been pioneering over the past years, namely digital libraries and (big) data management. It thus builds on years of investment into the Invenio digital library software which powers CDS, INSPIRE, Zenodo and many more services worldwide.

To note, however, that this is data from our real collision events, so one should not underplay its complexity nor understate the time and effort newcomers to our collaborations invest in learning the tools and techniques to interpret them. Along with the lower level analysis object data in the portal we are publishing high-level and reduced data sets and tools, which while easier to manipulate and appreciate, are still not entirely straightforward to interpret! So the “parachute” trepidation mentioned earlier is not so much our launch, more the time ahead when friends afar will access and tackle the data. We openly share and we are interested to hear how and where this data is used. Not only because we are curious, but also because we need to understand how best to present our open data assets, in forms which are useable now and in the future. The Open Data Portal launch is just the starting point – we hope many will take the opportunity to try.  And in the months to come we will be working with experts in the experiments to add more tools and data to make the task easier and easier.