Left uncared for, software decays. Like a grand, old building, it may fall into ruin if abandoned. But, as with any treasured monument, there are those who will fight to preserve scientific software; those who seek to ensure that the hard-won gains of today’s researchers are not frivolously lost for the researchers of tomorrow.
On Wednesday 30 January, around 40 such ‘e-preservationists’ converged on CERN for the SciencePAD Persistent Identifier’s Workshop (SPID2013), the aim of which was to investigate ways of improving collection, storage, and preservation of information concerning the software used in scientific research. Of course, the motivation for doing so is not mere nostalgia, but a desire to ensure that software developed can be reused by researchers in the future and that the scientific results generated through the use of specific software remain reproducible for years to come.
Alberto di Meglio, project director of the European Middleware Initiative and leader of its SciencePAD activities, stresses the importance of online software repositories in achieving these goals. “It is important that the software used to help produce scientific publications is properly cited,” he says. “The ultimate goal is for scientists to be able to locate software in a repository and get citation information which they can use in their own publications.”
“One solution is to write a publication about the software you’ve developed,” suggests Martin Fenner of ORCID. This way, he argues, researchers would be able to cite software used for research within existing academic publishing structures. Another approach would be to tag software with persistent identifiers, long-lasting reference codes — akin to the digital object identifiers (DOIs) assigned to research papers themselves.
However, this isn’t just about software developers trying to get the recognition they feel they deserve; rather, it’s about ensuring that research groups don’t duplicate one another’s efforts and that scientists are able to successfully build on the work of their peers. Rudolf Dimper of the European Synchrotron Radiation Facility says that online software repositories have a vital role to play in building the trust necessary to achieve these goals: “If software is well maintained and it is well documented, researchers then actually trust these programs to do their own data analysis and will be less tempted to redesign similar or even identical programs themselves.”
Neil Chue Hong, director of the Software Sustainability Institute, says that many of the issues discussed at the workshop could be solved by providing scientists with a better understanding of computational science. “We need to work on skills and capabilities,” he says. “A lot of researchers do not have the basics that they need to know about computational science in the same way that we hope all scientists have been taught the basics of statistics.”