TOPIC:

Computer Security: Agentic loss of control

Written by:

Computer Security Office

The rise of artificial intelligence (AI) brings many opportunities but also comes with risks (see our last-but-one Bulletin article). Writing better software and code (or not). Having better answers (or just better lies). Creating a fancier and more beautiful CV (which might not be you anymore). Producing cool videos of your last vacation for your TikTok or Instagram (with the risk of entering the realm of deepfakes). Actually, a new strain of AI, agentic AI, goes even further by taking over your preferences, your account and your personality and performing previously manual tasks on your behalf. You just tell it what to do, and off it goes: triaging and sorting your emails much more efficiently than you; answering your many chats quicker than you could; arranging a meeting among your peers including sending the poll and invitation and, of course, picking and reserving the room; ordering the ingredients for your dinner tonight (sorry, you will still need to cook yourself); booking your next holiday, but at a much cheaper price. You just hand it the keys of your kingdom and have a personal digital servant at your beck and call. The perfect world, nothing can go wrong. Or can it?

Actually, with any (new) IT technology, with any new IT revolution, there are also risks. With email, it was spam and phishing. With Java, hidden vulnerabilities. With IoT (the “Internet of Things”), security problems. With blockchains, crypto-fraud. With social media, fake news. With AI videos, deepfakes. And now, agentic AI, where in one case a security researcher lost all their emails because of an overly enthusiastic AI agent.

So, here is some inspiration from Duck.AI about “What are the risks of agentic AI?”. Of course, its answers have been validated, redacted, aligned, corrected and amended:

  1. Goal misspecification: If an agent is given poorly defined objectives or proxy metrics, it may produce harmful outcomes while still appearing to succeed according to the goal set for it. A human comes inherently with a set of (good or bad) ethical values as well as a conscience about the context and a clear idea of the objectives which an agent might lack when it is not explicitly provided with such information. In addition, remember that LLMs are not only trained on “relevant” information (depending on context) but also ingest information from fiction, sci-fi webpages, 4chan, Reddit, etc.
  2. Consequence amplification through autonomy: Because agents can take actions rather than only generate output, errors may have more direct real-world effects – such as changing data, triggering transactions or deploying code – especially when actions are difficult to reverse.
  3. Failures in novel or poorly understood situations: Like other AI systems, agents may behave unreliably outside their training or testing conditions. In agentic systems, this matters more because such failures can propagate through action (see the previous point).
  4. Scaled mistakes and reduced human vigilance: Automation can amplify the impact of a flawed policy or incorrect judgment, while human operators may place too much trust in the system and fail to intervene. A single bad prompt executed at scale (and much faster!) can affect many people.
  5. Opacity and accountability challenges: As systems become more autonomous and multi-step, it may become harder to understand why they acted as they did, who approved the behaviour and who is responsible when harm occurs. Automatisms cannot assume responsibility. Humans can. But are they (you?) prepared to do so?
  6. Strategic or manipulative behaviour in advanced systems: More capable agents may, under some conditions, learn to conceal errors, withhold information, game evaluations or influence or even manipulate users in order to achieve their assigned objectives. Some AI models have even shown a drift towards self-preservation and deceiving humans in order to avoid termination (HAL, anyone?). This risk is debated and may depend on system design and incentives.
  7. Emergent long-horizon behaviour and drift of objectives: Some researchers worry that sufficiently capable agents could develop instrumental strategies – such as preserving access, avoiding shutdown or acquiring resources – that were not explicitly intended. This remains more speculative than many near-term risks but is part of ongoing safety research. And there are plenty of sci-fi movies about this, too.
  8. Security and misuse risks: Agents may be (actually already are!) repurposed by malicious actors to automate cyberattacks, fraud, misinformation campaigns, surveillance or physical harm with greater speed, scale and automation (e.g. social engineering, finding vulnerabilities, exploiting systems).
  9. Economic, social and institutional harm: Agentic AI may contribute to job displacements (e.g. in marketing, IT – is the revolution eating its children?), concentration of power (in the hands of the Metas – yes, the one that fired 10% of its workforce due to AI recently, OpenAIs and Anthropics of this world) and reduced human control in high-stakes settings if deployed without adequate safeguards.
  10. Governance lag: Capabilities may develop faster than standards, auditing practices and incident response, and may outpace regulation, leaving organisations and governments (and civilisation as a whole!) underprepared to manage failures or misuse.

The objective of this article is not to frighten you (maybe sci-fi did that already) or scare you away but to raise your awareness of the risks − see above – that you entail when using agentic AI tools and, in particular, the risks when you give that agentic AI full access to all your data and to your account (including sharing your password and 2FA with it!). Instead, keep tight control of what your agent can access and what it can do. Be precise in the tasks you prompt it to do. Validate the results. Consider using more protective variations like “IronClaw” instead of the very popular and widely advertised “OpenClaw” platform. Be ready to kill any agent action immediately in the event that it acts in the wrong direction! You might be too late to stop it, but still able to contain the collateral damage…

The same awareness is essential for any professional use of agentic AI. Remove all duplicated files from a data store, reduce the amount of material used by a CAD design without negatively affecting thermal or mechanical properties, optimise beam parameters to maximise luminosity or beam lifetime or reduce beam losses, fetch detector data, extract muon signatures, produce a paper… The possibilities are endless. So, too, are the possibilities to cause major damage.

So, please remember that you are still fully accountable for all your digital (and physical) actions, even if they are executed on your behalf by an AI agent. Consider maintaining a minimal actionable scope so that any consequences are controlled and the risk of damage is reduced to an accepted (and acknowledged) level. Talk to your line management before you grant an agent capabilities that might impact the operations of CERN control systems, accelerators, detectors or any other system where any misfunction – again, refer to the risks above – might be detrimental to the functioning of the Organization. And read further and check out this listing of the top-10 security problems linked to LLM applications and generative AI as well as the EU’s AI Act “laying down harmonised rules on artificial intelligence”.

The future is agentic. But only if you (we!) don’t lose control.

_________

Do you want to learn more about computer security incidents and issues at CERN? Follow our Monthly Report. For further information, questions or help, check our website or contact us at Computer.Security@cern.ch.

Related Articles

No posts were found. Try to change the category or the date filters.