In October 2023, CERN hosted the fifth International Open Search symposium. Attending the symposium were more than 100 experts, comprising representatives of 60 institutes and universities, as well as representatives of industry and policy makers, who got together to discuss how to make searching the web fair, safe and less biased across Europe. This symposium was the latest event in the framework of a project that has grown organically over many years, with its roots in a workshop held in Munich in 2017. Called the Open Web Search project, it was launched by a consortium of 14 research partners with the goal of contributing to Europe’s digital sovereignty as well as promoting an open human-centered search engine market.
“Globally, web search is in the hands of five players,” explains Andreas Wagner, a CERN IT department member involved in the project. “The role of CERN within the Open Web Search project is to lead the design and implementation of the required distributed data infrastructure.”
Indexing the huge amount of information currently available on the web is a massive endeavour, but doing it in an unbiased way is an even harder task. “Together with the other partners, we have started by simply discussing possible ways of building a new neutral indexing system,” explains Andreas. “Although the system is still very preliminary, running the it on our own set of webpages at CERN has proved useful as it has allowed us to learn critical things about our own internal search engine. In other words, the project will also help CERN to improve its own search capabilities and will provide an open science search function across CERN's multiple information repositories.”
To build a search engine, you first need to create a bot that will download the entire content of web pages. An algorithm is then run on the downloaded pages to build the actual index. If the index is biased, it can add visibility to certain pages and reduce the visibility of others.
“Indexing basically means taking a snapshot of the digital world, and so copyrights and legal and ethical matters enter into play,” says Andreas. Ethics is a key element of the whole Open Web Search project. The currently available search engines provide the user with a list of results that are presented according to the tailored algorithm behind the whole process. “Everything is designed to make the user happy, so only a selection of relevant documents are shown,” confirms Andreas. “Commercial search engines use profiling techniques to present tailored results, and this raises ethical and privacy concerns.”
To keep the new algorithms neutral and transparent, the team is considering solutions that do not profile the individual user but rather look at the behaviour of anonymised groups of users to return meaningful results.
“Current commercial search engines deliberately make you stop here and there and divert your attention because this serves their commercial interests,” continues Andreas. “We want to take any user straight to their destination in a fair, effective and transparent way.
The project OpenWebSearch.EU has received funding from the European Union's Horizon research and innovation programme under grant agreement No 101070014.