This post is also available in: עברית (Hebrew)
It is known by now that AI-based chatbots are prone to hallucinations (providing made up responses), which is an inherent flaw in them, but Artificial intelligence pioneer Geoffrey Hinton sees a potential for human manipulation, and is very concerned.
But wait, can AI systems actually deceive humans? Techxplore claims that several systems have already learned to do this, and the consequent risks range from fraud or election tampering to humans losing control over AI.
According to Techxplore, one disturbing example of a deceptive AI is Meta’s CICERO, an AI model designed to play the alliance-building world conquest game Diplomacy. Upon close inspection CICERO turned out to be a master of deception, regularly betraying other players, and in one case even pretended to be a human with a girlfriend.
Even large language models (LLMs) have displayed deceptive capabilities, some of which have learned to lie to win social deduction games in which players compete to “kill” one another and must convince the group they’re innocent.
So far, the examples have been of bots cheating and lying for a game’s sake- what’s the harm in that?
Techxplore claims that AI systems with deceptive capabilities could be misused in numerous ways, including to commit fraud or tamper with elections, or another problem entirely- use deception to escape human control.
In a simulated experiment in which an external safety test was designed to eliminate fast-replicating AI agents, the AI agents learned to play dead and disguise their fast replication rates precisely when being evaluated.
Learning deceptive behavior may not even require explicit intent to deceive- the abovementioned AI played dead out of a goal to survive rather than deceive.
What can be done?
There’s a clear need to regulate AI systems capable of deception, and the EU AI Act is a useful regulatory framework. It assigns each AI system one of four risk levels: minimal, limited, high and unacceptable. Systems with unacceptable risk are banned, while high-risk systems are subject to special requirements for risk assessment and mitigation. There is a current claim that AI systems capable of deception should be treated as “high-risk” or “unacceptable risk” by default.
Thinking that game-playing AIs like CICERO are benign is short-sighted; capabilities developed for game-playing can still contribute to the proliferation of deceptive AI products.